Open dyyoungg opened 2 months ago
I compared two experimental data setups. setting 1: WenetSpeech(Chinese)only setting 2: Wenet + Giga (about 1:1, Chinese + English)
It's interesting that training on setting 1 can't decrease normally (blue curve in the following image), while setting 2 mixed with English can converge normally. Have you observed this phenomenon in your experiments?
This situation is somewhat unusual. You may use a small amount of Chinese data (approximately 500 hours) to verify whether this issue always arises when the model is trained on purely Chinese data.
I‘m intersting in Chinese too. Do you have any further results?
WenetSpeech could be too noisy, you may want to start with AIShell3, then WenetSpeech4TTS.
I compared two experimental data setups. setting 1: WenetSpeech(Chinese)only setting 2: Wenet + Giga (about 1:1, Chinese + English)
It's interesting that training on setting 1 can't decrease normally (blue curve in the following image), while setting 2 mixed with English can converge normally. Have you observed this phenomenon in your experiments?