Closed piraka9011 closed 3 years ago
Also, the train_ecapa_tdnn.yaml
config for this does not use !ref <sample_rate>
for the TimeDomainSpecAugment
augmentation.
Sorry for continuously extending this issue.
If the start
time is a float (i.e. '0.0'
), then this line in the same file throws a
Original Traceback (most recent call last):
File "/home/allabana/.virtualenvs/sprain/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/allabana/.virtualenvs/sprain/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/allabana/.virtualenvs/sprain/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/allabana/stt_ws/speechbrain/speechbrain/dataio/dataset.py", line 166, in __getitem__
return self.pipeline.compute_outputs(data_point)
File "/home/allabana/stt_ws/speechbrain/speechbrain/utils/data_pipeline.py", line 456, in compute_outputs
return self._compute(data, self._exec_order, self.output_mapping)
File "/home/allabana/stt_ws/speechbrain/speechbrain/utils/data_pipeline.py", line 488, in _compute
values = item(*args) # Call the DynamicItem to produce output
File "/home/allabana/stt_ws/speechbrain/speechbrain/utils/data_pipeline.py", line 73, in __call__
return self.func(*args)
File "train_speaker_embeddings.py", line 142, in audio_pipeline
start = int(start)
ValueError: invalid literal for int() with base 10: '0.0'
my hack is start = int(float(start))
Right, for point 1. we could make the log more clear for sure. However, we do not want to introduce some skipping based on the data. This is typically what Kaldi or ESPnet (that relies on it in the end ...) do in the data preparation part. The HUGE drawback from this approach is that it is not transparent. If you do so, you don't really know what sample you process and how. A very good example of that occurred few weeks go where a good scientist went with a custom dataset on our Seq2Seq ASR recipe. He encountered many, many errors like this one when connecting his dataset to SpeechBrain due to its data. Turns out that this wasn't happening with Kaldi and ESPnet because many many many samples were just removed, and he didn't know ! All the results with these toolkits where therefore biased and wrong. I wonder what is the best way of adding some basic checks about the wav in the data ladder @Gastron ? Maybe in the read_ function ?
The last point your reported clearly is an error indeed. Thanks !
I agree that it's important to be transparent and not do something to the data without at least informing the user.
I think as you mentioned, maybe some docs or better error reporting in the pipeline is useful so we know how we should prepare our data.
The recipes of voxceleb has been reviewed over the last couple of weeks. Some of the points raised here are fixed. Thank you!
Hi, I got the same error with the recipes on the master branch for voxceleb2. Do I have to do something beforehand for the data preparation?
It looks like in
train_speaker_embeddings.py
for the SpeakerRec recipe, the [following line] can cause this error:it looks like there should be a check if the calculated
duration_sample - snt_len_sample
is < 0 and either: skip the sample or return 0.