modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.03k stars 648 forks source link

any support for fine-tune audio data longer than 1 minute? #2053

Open Jack-Lin-gif opened 2 weeks ago

Jack-Lin-gif commented 2 weeks ago

What is your question?

For finetuning my model, should I prepare audio data less than 15s? I have lots of audios longer than 1 minute, should I split them manually, or there are other convenient ways? Can I use the vad model during fine-tune process?

What's your environment?

LauraGPT commented 1 week ago

We would support it soon via flash-attn