huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.33k stars 238 forks source link

支持中文吗? #70

Open xxm1668 opened 6 months ago

sanchit-gandhi commented 5 months ago

It does not, but you have two options for training a Chinese-compatible model:

  1. Follow the distillation instructions in the training folder and train on the Mandarin split of the Common Voice dataset https://github.com/huggingface/distil-whisper/tree/main/training
  2. Fine-tune the pre-trained checkpoint on Mandarin (instructions for this can also be found under the training folder)

=> 1 will get you the best results, but is a little bit more involved than 2

Disclaimer: the following translation is generated using Google Translate

它没有,但你有两种选择来训练中文兼容模型:

  1. 按照训练文件夹中的蒸馏说明,对 Common Voice 数据集的普通话分割进行训练 https://github.com/huggingface/distil-whisper/tree/main/training
  2. 微调普通话的预训练检查点(也可以在训练文件夹下找到相关说明)

=> 1 会给你最好的结果,但比 2 稍微复杂一些

免责声明:使用 Google 翻译翻译回复

xxm1668 commented 5 months ago

thanks your recommendation

shuaijiang commented 4 months ago

you can refer to https://huggingface.co/BELLE-2/Belle-distilwhisper-large-v2-zh support Chinese based on distil-whipser-larger-v2

xxm1668 commented 4 months ago

thanks