LibriSpeechCorpus / ExtractAudioFeatures / librosa.filters.mel: operands could not be broadcast together

albertz commented 5 years ago

...
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/TFEngine.py", line 1180, in train
    line: self.train_epoch()
    locals:
      self = <local> <TFEngine.Engine object at 0x7f98891c3978>
      self.train_epoch = <local> <bound method Engine.train_epoch of <TFEngine.Engine object at 0x7f98891c3978>>
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/TFEngine.py", line 1270, in train_epoch
    line: trainer = Runner(engine=self, dataset=self.train_data, batches=train_batches, train=True)
    locals:
      trainer = <not found>
      Runner = <global> <class 'TFEngine.Runner'>
      engine = <not found>
      self = <local> <TFEngine.Engine object at 0x7f98891c3978>
      dataset = <not found>
      self.train_data = <local> <LibriSpeechCorpus 'train' epoch=1>
      batches = <not found>
      train_batches = <local> <EngineBatch.BatchSetGenerator object at 0x7f987319ccc0>
      train = <not found>
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/TFEngine.py", line 63, in __init__
    line: engine.network.extern_data.check_matched_dataset(
            dataset=dataset, used_data_keys=engine.network.used_data_keys)
    locals:
      engine = <local> <TFEngine.Engine object at 0x7f98891c3978>
      engine.network = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      engine.network.extern_data = <local> <ExternData data={'classes': Data(name='classes', shape=(None,), dtype='int32', sparse=True, dim=
10025, available_for_inference=False), 'data': Data(name='data', shape=(None, 40))}>
      engine.network.extern_data.check_matched_dataset = <local> <bound method ExternData.check_matched_dataset of <ExternData data={'classe
s': Data(name='classes', shape=(None,), dtype='int32', sparse=True, dim=10025, available_for_inference=False), 'data': Data(name='data', sha
pe=(None, 40))}>>
      dataset = <local> <LibriSpeechCorpus 'train' epoch=1>
      used_data_keys = <not found>
      engine.network.used_data_keys = <local> {'data', 'classes'}, len = 2
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/TFNetwork.py", line 106, in check_matched_dataset
    line: data_dtype = dataset.get_data_dtype(key)
    locals:
      data_dtype = <not found>
      dataset = <local> <LibriSpeechCorpus 'train' epoch=1>
      dataset.get_data_dtype = <local> <bound method CachedDataset2.get_data_dtype of <LibriSpeechCorpus 'train' epoch=1>>
      key = <local> 'classes', len = 7
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/CachedDataset2.py", line 201, in get_data_dtype
    line: self._load_something()
    locals:
      self = <local> <LibriSpeechCorpus 'train' epoch=1>
      self._load_something = <local> <bound method CachedDataset2._load_something of <LibriSpeechCorpus 'train' epoch=1>>
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/CachedDataset2.py", line 128, in _load_something
    line: self.load_seqs(self.expected_load_seq_start, self.expected_load_seq_start + 1)
    locals:
      self = <local> <LibriSpeechCorpus 'train' epoch=1>
      self.load_seqs = <local> <bound method Dataset.load_seqs of <LibriSpeechCorpus 'train' epoch=1>>
      self.expected_load_seq_start = <local> 0
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/Dataset.py", line 220, in load_seqs
    line: self._load_seqs(start, end)
    locals:
      self = <local> <LibriSpeechCorpus 'train' epoch=1>
      self._load_seqs = <local> <bound method CachedDataset2._load_seqs of <LibriSpeechCorpus 'train' epoch=1>>
      start = <local> 0
      end = <local> 1
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/CachedDataset2.py", line 88, in _load_seqs
    line: seqs = [self._collect_single_seq(seq_idx=seq_idx) for seq_idx in range(start, end)]
    locals:
      seqs = <not found>
      self = <local> <LibriSpeechCorpus 'train' epoch=1>
      self._collect_single_seq = <local> <bound method LibriSpeechCorpus._collect_single_seq of <LibriSpeechCorpus 'train' epoch=1>>
      seq_idx = <not found>
      range = <builtin> <class 'range'>
      start = <local> 0
      end = <local> 1
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/CachedDataset2.py", line 88, in <listcomp>
    line: seqs = [self._collect_single_seq(seq_idx=seq_idx) for seq_idx in range(start, end)]
    locals:
      seqs = <not found>
      self = <local> <LibriSpeechCorpus 'train' epoch=1>
      self._collect_single_seq = <local> <bound method LibriSpeechCorpus._collect_single_seq of <LibriSpeechCorpus 'train' epoch=1>>
      seq_idx = <local> 0
      range = <builtin> <class 'range'>
      start = <not found>
      end = <not found>
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/GeneratingDataset.py", line 2085, in _collect_single_seq
    line: features = self.feature_extractor.get_audio_features(audio=audio, sample_rate=sample_rate)
    locals:
      features = <not found>
      self = <local> <LibriSpeechCorpus 'train' epoch=1>
      self.feature_extractor = <local> <GeneratingDataset.ExtractAudioFeatures object at 0x7f98891c38d0>
      self.feature_extractor.get_audio_features = <local> <bound method ExtractAudioFeatures.get_audio_features of <GeneratingDataset.Extrac
tAudioFeatures object at 0x7f98891c38d0>>
      audio = <local> array([ 2.78090321e-03,  3.32838898e-03,  4.34225125e-03, ...,
                             -4.98423541e-05, -1.79219493e-04,  1.82329724e-04]), len = 22912
      sample_rate = <local> 16000
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/GeneratingDataset.py", line 773, in get_audio_features
    line: feature_data = _get_audio_features_mfcc(**kwargs)
    locals:
      feature_data = <not found>
      _get_audio_features_mfcc = <global> <function _get_audio_features_mfcc at 0x7f9b9c0a8378>
      kwargs = <local> {'sample_rate': 16000, 'window_len': 0.025, 'step_len': 0.01, 'num_feature_filters': 40, 'audio': array([ 2.78090321e
-03,  3.32838898e-03,  4.34225125e-03, ...,
                              -4.98423541e-05, -1.79219493e-04,  1.82329724e-04])}
  File "/u/zeyer/setups/librispeech/2018-02-26--att/returnn/GeneratingDataset.py", line 814, in _get_audio_features_mfcc
    line: mfccs = librosa.feature.mfcc(
            audio, sr=sample_rate,
            n_mfcc=num_feature_filters,
            hop_length=int(step_len * sample_rate), n_fft=int(window_len * sample_rate))
    locals:
      mfccs = <not found>
      librosa = <local> <module 'librosa' from '/u/zeyer/.local/lib/python3.6/site-packages/librosa/__init__.py'>
      librosa.feature = <local> <module 'librosa.feature' from '/u/zeyer/.local/lib/python3.6/site-packages/librosa/feature/__init__.py'>
      librosa.feature.mfcc = <local> <function mfcc at 0x7f98700e9488>
      audio = <local> array([ 2.78090321e-03,  3.32838898e-03,  4.34225125e-03, ...,
                             -4.98423541e-05, -1.79219493e-04,  1.82329724e-04]), len = 22912
      sr = <not found>
      sample_rate = <local> 16000
      n_mfcc = <not found>
      num_feature_filters = <local> 40
      hop_length = <not found>
      int = <builtin> <class 'int'>
      step_len = <local> 0.01
      n_fft = <not found>
      window_len = <local> 0.025
  File "/u/zeyer/.local/lib/python3.6/site-packages/librosa/feature/spectral.py", line 1299, in mfcc
    line: S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))
    locals:
      S = <local> None
      power_to_db = <global> <function power_to_db at 0x7f98700d1048>
      melspectrogram = <global> <function melspectrogram at 0x7f98700e9510>
      y = <local> array([ 2.78090321e-03,  3.32838898e-03,  4.34225125e-03, ...,
                         -4.98423541e-05, -1.79219493e-04,  1.82329724e-04]), len = 22912
      sr = <local> 16000
      kwargs = <local> {'hop_length': 160, 'n_fft': 400}
  File "/u/zeyer/.local/lib/python3.6/site-packages/librosa/feature/spectral.py", line 1391, in melspectrogram
    line: mel_basis = filters.mel(sr, n_fft, **kwargs)
    locals:
      mel_basis = <not found>
      filters = <global> <module 'librosa.filters' from '/u/zeyer/.local/lib/python3.6/site-packages/librosa/filters.py'>
      filters.mel = <global> <function mel at 0x7f98701338c8>
      sr = <local> 16000
      n_fft = <local> 400
      kwargs = <local> {}
  File "/u/zeyer/.local/lib/python3.6/site-packages/librosa/filters.py", line 247, in mel
    line: lower = -ramps[i] / fdiff[i]
    locals:
      lower = <not found>
      ramps = <local> array([[[ 0.00000000e+00, -4.00000000e+01, -8.00000000e+01, ...,
...                             [[ 4.67655199e+01,  6.76551987e+00..., len = 130, _[0]: {len = 1, _[0]: {len = 201}}
      i = <local> 0
      fdiff = <local> array([], shape=(130, 0), dtype=float64), len = 130, _[0]: {len = 0}
ValueError: operands could not be broadcast together with shapes (1,201) (0,) 
Unhandled exception <class 'ValueError'> in thread <_MainThread(MainThread, started 140306969966336)>, proc 18665.

I tried with latest master (8d0fc94b), 20190130.151405 (0eb0f38a16ebcf27), 20181129.181745 (ae50b3ab), 20181012.011529 (5d3774d5).

albertz commented 5 years ago

@Spotlight0xff @kazuki-irie @JackTemaki @mmz33 Have you seen this before? I will try some older versions. But I don't really understand the problem yet. Maybe not in RETURNN itself, but in librosa? Some multi-threading issue or so?

albertz commented 5 years ago

https://github.com/Rayhane-mamah/Tacotron-2/issues/321 seems related.

albertz commented 5 years ago

I reported it upstream for librosa. I had librosa 0.5.1. After updating to librosa 0.6.2, the issue seems to be gone. (I wonder what librosa version I had used previously, and if it was 0.5.1 or a different one.)

rwth-i6 / returnn

LibriSpeechCorpus / ExtractAudioFeatures / librosa.filters.mel: operands could not be broadcast together #132