Traning on wenetspeech couldn‘t converge

dyyoungg commented 2 months ago

I compared two experimental data setups. setting 1: WenetSpeech（Chinese）only setting 2: Wenet + Giga （about 1：1， Chinese + English）

It's interesting that training on setting 1 can't decrease normally （blue curve in the following image）， while setting 2 mixed with English can converge normally. Have you observed this phenomenon in your experiments?

jishengpeng commented 2 months ago

I compared two experimental data setups. setting 1: WenetSpeech（Chinese）only setting 2: Wenet + Giga （about 1：1， Chinese + English）

It's interesting that training on setting 1 can't decrease normally （blue curve in the following image）， while setting 2 mixed with English can converge normally. Have you observed this phenomenon in your experiments?

This situation is somewhat unusual. You may use a small amount of Chinese data (approximately 500 hours) to verify whether this issue always arises when the model is trained on purely Chinese data.

wntg commented 2 months ago

I‘m intersting in Chinese too. Do you have any further results?

boltzmann-Li commented 4 days ago

WenetSpeech could be too noisy, you may want to start with AIShell3, then WenetSpeech4TTS.

jishengpeng / WavTokenizer

Traning on wenetspeech couldn‘t converge #28