k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
886 stars 286 forks source link

[Help needed] Support https://huggingface.co/datasets/Alex-Song/MSR-86K #1674

Open csukuangfj opened 2 months ago

csukuangfj commented 2 months ago

MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research The above paper has just open-sourced a dataset for 15 languages and is available at https://huggingface.co/datasets/Alex-Song/MSR-86K

It would be great if someone could train a (streaming or/and a non-streaming) zipformer model with it.

Screenshot 2024-07-01 at 10 25 17
yuyun2000 commented 2 months ago

I can contribute a recipe for a streaming model for one of the languages. Do you need it?

csukuangfj commented 2 months ago

I can contribute a recipe for a streaming model for one of the languages. Do you need it?

Yes, definitely we need it. Thank you!