huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.33k stars 238 forks source link

Can fine tuning for phoneme task? #57

Closed jackNhat closed 6 months ago

jackNhat commented 6 months ago

Has anyone experimented with fine-tuning the phoneme recognition task (English), please share some of your experiments. Many thanks !

sanchit-gandhi commented 6 months ago

I haven't personally, but would be interested to know if anyone has tried this. Note that the Whisper tokenizer does not contain phoneme tokens, so the model will require a new tokenizer to be trained and subsequently the vocabulary size to be adjusted (c.f. Wav2Vec2PhonemeCTCTokenizer and https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311/2?u=sanchit-gandhi)