Open rolisz opened 6 months ago
Hey @rolisz! It would indeed be possible to use audio-only samples during pseudo-labelling. You're correct in that the pseudo-labelling script currently assumes we're pseudo-labelling a dataset of (audio, text) pairs, but there's no reason why we couldn't generalise this to just (audio) examples. This should be pretty simple: you can just rip out all the references to "text_column_name"
and "labels"
in the pseudo labelling script.
Is it possible to do the pseudo labeling without access to already transcribed audios?
From what I see in the training scripts, the dataset should have a text column, so it's not possible to just use a bunch of audio to distill a Whisper model.