speechbrain / speechbrain

A PyTorch-based Speech Toolkit
http://speechbrain.github.io
Apache License 2.0
8.91k stars 1.4k forks source link

train_speaker_embeddings audio pipeline does not check for out of range values #492

Closed piraka9011 closed 3 years ago

piraka9011 commented 3 years ago

It looks like in train_speaker_embeddings.py for the SpeakerRec recipe, the [following line] can cause this error:

Original Traceback (most recent call last):
  File "/home/allabana/.virtualenvs/sprain/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/allabana/.virtualenvs/sprain/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/allabana/.virtualenvs/sprain/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/allabana/stt_ws/speechbrain/speechbrain/dataio/dataset.py", line 166, in __getitem__
    return self.pipeline.compute_outputs(data_point)
  File "/home/allabana/stt_ws/speechbrain/speechbrain/utils/data_pipeline.py", line 456, in compute_outputs
    return self._compute(data, self._exec_order, self.output_mapping)
  File "/home/allabana/stt_ws/speechbrain/speechbrain/utils/data_pipeline.py", line 488, in _compute
    values = item(*args)  # Call the DynamicItem to produce output
  File "/home/allabana/stt_ws/speechbrain/speechbrain/utils/data_pipeline.py", line 73, in __call__
    return self.func(*args)
  File "train_speaker_embeddings.py", line 138, in audio_pipeline
    start = random.randint(0, duration_sample - snt_len_sample - 1)
  File "/usr/lib/python3.8/random.py", line 248, in randint
    return self.randrange(a, b+1)
  File "/usr/lib/python3.8/random.py", line 226, in randrange
    raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0, -16320, -16320)

it looks like there should be a check if the calculated duration_sample - snt_len_sample is < 0 and either: skip the sample or return 0.

start = 0 if start_range < 0 else random.randint(0, start_range)
piraka9011 commented 3 years ago

Also, the train_ecapa_tdnn.yaml config for this does not use !ref <sample_rate> for the TimeDomainSpecAugment augmentation.

piraka9011 commented 3 years ago

Sorry for continuously extending this issue.

If the start time is a float (i.e. '0.0'), then this line in the same file throws a

Original Traceback (most recent call last):
  File "/home/allabana/.virtualenvs/sprain/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/allabana/.virtualenvs/sprain/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/allabana/.virtualenvs/sprain/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/allabana/stt_ws/speechbrain/speechbrain/dataio/dataset.py", line 166, in __getitem__
    return self.pipeline.compute_outputs(data_point)
  File "/home/allabana/stt_ws/speechbrain/speechbrain/utils/data_pipeline.py", line 456, in compute_outputs
    return self._compute(data, self._exec_order, self.output_mapping)
  File "/home/allabana/stt_ws/speechbrain/speechbrain/utils/data_pipeline.py", line 488, in _compute
    values = item(*args)  # Call the DynamicItem to produce output
  File "/home/allabana/stt_ws/speechbrain/speechbrain/utils/data_pipeline.py", line 73, in __call__
    return self.func(*args)
  File "train_speaker_embeddings.py", line 142, in audio_pipeline
    start = int(start)
ValueError: invalid literal for int() with base 10: '0.0'

my hack is start = int(float(start))

TParcollet commented 3 years ago

Right, for point 1. we could make the log more clear for sure. However, we do not want to introduce some skipping based on the data. This is typically what Kaldi or ESPnet (that relies on it in the end ...) do in the data preparation part. The HUGE drawback from this approach is that it is not transparent. If you do so, you don't really know what sample you process and how. A very good example of that occurred few weeks go where a good scientist went with a custom dataset on our Seq2Seq ASR recipe. He encountered many, many errors like this one when connecting his dataset to SpeechBrain due to its data. Turns out that this wasn't happening with Kaldi and ESPnet because many many many samples were just removed, and he didn't know ! All the results with these toolkits where therefore biased and wrong. I wonder what is the best way of adding some basic checks about the wav in the data ladder @Gastron ? Maybe in the read_ function ?

The last point your reported clearly is an error indeed. Thanks !

piraka9011 commented 3 years ago

I agree that it's important to be transparent and not do something to the data without at least informing the user.

I think as you mentioned, maybe some docs or better error reporting in the pipeline is useful so we know how we should prepare our data.

mravanelli commented 3 years ago

The recipes of voxceleb has been reviewed over the last couple of weeks. Some of the points raised here are fixed. Thank you!

pnsafari commented 3 years ago

Hi, I got the same error with the recipes on the master branch for voxceleb2. Do I have to do something beforehand for the data preparation?