modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.47k stars 688 forks source link

英文热词(English Hotword) #1616

Closed treya-lin closed 6 months ago

treya-lin commented 6 months ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Thanks for your great work. I am using this model (https://www.modelscope.cn/models/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) to process Chinese audio which contains a lot of English words, and I noticed that while the Chinese hot words worked pretty good (e.g. "英伟达"), the English hotwords never took effect (e.g. "ChatGPT", "prompt"). I am curious if the team has noticed it and had any thought about it? Is it possible to improve it by finetuning? Any advice is greatly appreciated!

Code

from funasr import AutoModel
model = AutoModel(model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision=None,
                  vad_model=""iic/speech_fsmn_vad_zh-cn-16k-common-pytorch", vad_model_revision=None,
                  punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", punc_model_revision=None,
                  # spk_model="cam++", spk_model_revision="v2.0.2",
                  )
res = model.generate(input=wavpath, batch_size_s=300, hotword = "ChatGPT OpenAI")

What have you tried?

I tried many different hotwords

What's your environment?

R1ckShi commented 6 months ago

Thanks for using seaco-paraformer model, English hotword has been a noticed issue for a long time, the main problem is that the correction of English hotword always come together with token number changes (because of BPE modeling units), so it is difficult to support English hotword under paraformer backbone(CIF predicts token number at the very beginning), we have make some progress in this issue and the advanced models may be open-sourced one day in the future.

treya-lin commented 6 months ago

Thanks for using seaco-paraformer model, English hotword has been a noticed issue for a long time, the main problem is that the correction of English hotword always come together with token number changes (because of BPE modeling units), so it is difficult to support English hotword under paraformer backbone(CIF predicts token number at the very beginning), we have make some progress in this issue and the advanced models may be open-sourced one day in the future.

Hi I see. Thank you very much for the clarification! I will be looking forward to the advanced models then. :D