is it possible to train it with own dataset for kazakh language?

huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

MIT License

3.33k stars 238 forks source link

Closed diasbalmash closed 8 months ago

diasbalmash commented 8 months ago

Suppose I have a large dataset in kazakh language. So, can I use this approach to train model in kazakh language?

sanchit-gandhi commented 8 months ago

Yes! We'll be releasing training code this week which will generalise to all languages Whisper was pre-trained on (and likely also those it wasn't 👀)

diasbalmash commented 8 months ago

@sanchit-gandhi thank you for info

sanchit-gandhi commented 7 months ago