Closed ariadnasc closed 2 months ago
I had the same issue. That solution above worked.
Hey @ariadnasc, thanks for opening this issue! Could you open a PR to deal with this ? Thanks in advance!
Hey @ylacombe, no worries! I'll submit the PR tomorrow :)
@ylacombe I've just opened the PR - thanks!
It's fixed, thanks for your help!
Hi! I've been working with the dataspeech package for a few weeks now, and I bumped into a behaviour that could be an issue for specific datasets (like the one I am using) but can easily be fixed - I'm describing it below.
Description of the issue
When running the data extraction pipeline for a dataset, i.e. running
main.py
, I encountered the following error message:I have figured out the the issue comes when the speech duration output of Brouhaha's VAD model for the first sample of a batch is an integer (which could be 0 if the VAD finds no speech, but it could also happen if the duration is e.g. exactly 1 second). When this happens, the speech_durations' list is created as a list of integers (and not floats), which causes this error for the next sample (as it will require to be truncated as it's a float).
Proposed solution
As a quick fix which has worked for me, I have wrapped the
speech_duration
introduced to the list withnp.float32()
. I have done this in line 49 ofdataspeech/gpu_enrichments/snr_and_reverb.py
. Once I make this change, the pipeline runs successfully. The change looks like below:If you agree with the change that I propose, I am happy to create a commit to push it. Otherwise, please advise to prevent something like this from happening.
Thanks!