Open yiwei0730 opened 3 weeks ago
Hi, thanks for your attention. StreamSpeech architecture can support multilingual speech-to-speech translation, which we have also explored above. Since multilingual is not the core highlight of this work, we did not cover it in our paper.
If you want to train a multilingual StreamSpeech on CVSS-C, you only need to modify the data processing part. The training part is the same.
lang
in preprocess_scripts/preprocess.sh into 'all'vocab-size
in preprocess_scripts/7.prep_cvss_c_multitask_asr_data.sh to support multilingual vocabulary.Hope these can help you.
Hi, what changes should be made for speech translation to a language other than English, what parts need to be modified apart from data processing?
Thanks!
@arararz Hi, If you want to train StreamSpeech that translate speech to other languages (other than English), in addition to data preparation, there are two points to note:
--ctc-upsample-rate
. You can refer to Appendix D of our paper and adjust it to 2-3 times the unit/word sequence length ratio.Hope these can help you~
@zhangshaolei1998 hey very interesting work. I was wondering about the training time and what system configuration did you use? thanks
@thetushargoyal Hi, the training takes less than 1 day on 8 NVIDIA 3090 GPUs.
Hello, this is amazing. I want to ask is it can be trained in other languages, or even if can be trained in multiple languages at the same time.