Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset

Hi, I have a 85 hr of Chinese audio voice at 44100 hz to fintuning en-us/lessac/medium .ckpt, but effect not good. And my loss_gen_all looks so high, loss_disc_all looks normal.

Questions:

Sample Rate Conversion: Is it advisable to convert the sample rate from 44,100 Hz to 22,050 Hz before fine-tuning? Could this conversion be contributing to the high loss_gen_all? Language Adaptation: Since I am fine-tuning an English model with Chinese data, are there specific configurations or adjustments you recommend to improve performance? Model Compatibility: Are there any known issues or limitations when fine-tuning the en-us/lessac/medium.ckpt model with a non-English dataset?

Any guidance or suggestions you could provide would be greatly appreciated.

Thank you for your time and assistance.

piper

rhasspy / piper

Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613