Open shuttle1987 opened 6 years ago
Good point! It may even be good to go so far as to perform a thorough 'sanity check' on the file (if that's not too much trouble). It is not impossible that a user will realize, when feeding a WAV file into Persephone, that the WAV file does not contain what the user thinks it does.
When depositing WAV files for long-term archiving at CINES, as part of the standard Pangloss/CoCoON workflow, an integrity check is performed. To mention an anecdote: one time, I got a warning that a WAV file was corrupt. I had checked it aurally (listening to the first few seconds) and all was for the best... but when I looked at the waveform I saw that the WAV file was empty after 1 minute or so (out of a recording of about 7 minutes). Data corruption happened when using a file transfer system (126.com) to get the files to Séverine, the LACITO engineer.
The Persephone scenario is very different, obviously. The general idea is that it's good to check for possible issues early and often, rather than try to catch bugs down the line.
The general idea is that it's good to check for possible issues early and often, rather than try to catch bugs down the line.
Spoken like a wise software engineer!
I'd like to sort this out today, what exactly do we define as an empty wave file? There's the most basic case where the wave file has no audio frames at all, but is there anything else that will fall under the "empty" category?
That's essentially it. Another related consideration is that if there are less filterbank frames than there are transcription labels, CTC will break. (by default in Persephone, filterbank frame extraction strides at 10ms intervals in the original WAV) A similar test could be done to preclude putting such WAVs int the corpus, but its really dictated by the choice of filterbank parameters as well as the transcription label granularity. ie the same wav could be accepted or rejected depending on the feature extraction choice or what the transcriptions are, so probably that test should happen after filterbank extraction.
OK so it seems like the only easy test is to test that an audio file contains wave data and some frames, everything else can only be deemed to be valid later depending on the context in which it is being used.
Currently you can attempt to run an empty audio file from the model. A wave file can contain no actual audio and it might be good to handle this edge case.