lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
936 stars 214 forks source link

Sample length exception should maybe be using DurationMismatchError? #1225

Closed RuABraun closed 10 months ago

RuABraun commented 10 months ago

As I understand the the purpose of null_result_on_audio_loading_error here() is to avoid feature extraction crashing because a few files fail. That seems to not work like before now that a duration mismatch results in a ValueError.

Maybe this was a purposeful change to be more explicit about stuff failing (I don't know the intent). But thought I'd ask to confirm

pzelasko commented 10 months ago

The null result thing is to catch exceptions mostly related to corrupted files and/or broken connections (when using web storage), i.e. things that the user can't really control. The value error you mentioned was never caught by this: it's raised when you have incorrect manifest metadata vs the real audio file. This can happen for several reasons: one example is incorrect manifest construction; but another possibility is that a given audio file is encoded with some codec but decoded with a different one, which adds more padding frames at the end (this often happens with mp3 and opus, and sometimes even with flac). In that case you can use lhotse.set_audio_duration_mismatch_tolerance() or env var LHOTSE_AUDIO_DURATION_MISMATCH_TOLERANCE and set it to some high-ish number like 0.1 or 0.3, if you can confirm this is the case.

RuABraun commented 10 months ago

Thanks for explaining, I understand the motivation to only suppress specific errors.

On the side: There definitely was a behavior change as I previously (before upgrading lhotse) wasn't getting this error and would just end up with some data missing after completing feature extraction. Don't have time to trace the logic. Already using the tolerance parameter.