huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.33k stars 238 forks source link

is it possible to train it with own dataset for kazakh language? #21

Closed diasbalmash closed 8 months ago

diasbalmash commented 8 months ago

Suppose I have a large dataset in kazakh language. So, can I use this approach to train model in kazakh language?

sanchit-gandhi commented 8 months ago

Yes! We'll be releasing training code this week which will generalise to all languages Whisper was pre-trained on (and likely also those it wasn't 👀)

diasbalmash commented 8 months ago

@sanchit-gandhi thank you for info

sanchit-gandhi commented 7 months ago

Training code released under this folder: https://github.com/huggingface/distil-whisper/tree/main/training