Closed ylacombe closed 5 months ago
I dont't see any big difference in the C50 distribution when running it on MLS English test split, do you think of any ways to make sure it does what's intended?
Do you have quick tests that I could implement to check if that works as expected and doesn't break the library?
I took a 10-second audio file of clean speech and added reverberation using SOX. I created multiple shorter versions of it, trimming the end of the audio files, considering the 2/4/6/8/10 first seconds.
Here's what we got on the original clean speech segment (no reverb):
1) Before your fix:
2) After your fix:
There is no difference in this case; Brouhaha returns a high value of C50, regardless of the file duration. This is the expected behavior.
Now, with the reverberated segment:
1) Before your fix:
Brouhaha predicts a high C50 value for the 2-s long audio file, whereas it should predict a low C50 value as we know the speech is highly reverberated.
2) After your fix:
Here, the behavior is consistent and no longer depends on the file duration. We get a low C50 value, as expected.
Note that I also propose a fix for https://github.com/marianne-m/brouhaha-vad/issues/20, which is actually just because of a breaking change in the newest Pyannote https://github.com/pyannote/pyannote-audio/blob/48b68022f707fa376e2331dd9331fbdeadd0e2ab/pyannote/audio/core/model.py#L177-L192
Awesome! Thank you so much :)
I dont't see any big difference in the C50 distribution when running it on MLS English test split, do you think of any ways to make sure it does what's intended?
Hmm, how strange! You should at least have a (quite strong) difference for short audio files that contain reverberation. Can you check the distribution shift for these specific segments?
I had to add a few fixes for the apply function to work (main.py). Are you sure you're using this to predict the C50 values?
Hey @MarvinLvn, as discussed offline, here is a proposal that fixes last chunk computation!
Do you have quick tests that I could implement to check if that works as expected and doesn't break the library?
Note that I also propose a fix for #20 which is actually just because of a breaking change in the newest Pyannote https://github.com/pyannote/pyannote-audio/blob/48b68022f707fa376e2331dd9331fbdeadd0e2ab/pyannote/audio/core/model.py#L177-L192