zceng / LVCNet

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
Apache License 2.0
79 stars 16 forks source link

poor synthesis #6

Closed forwiat closed 2 years ago

forwiat commented 2 years ago

hello @zceng, thanks for your contribution in LVCNet. I follow your default experiment's setting about training LJSpeech. Including sampling rate = 22050, batch_size = 8 and so on. But I have a poor synthesis although training reached checkpoint-180000.

real: image generate: image wavs zip: 0100_wavs.zip

Is it because not enough training? or other reasons? Looking forward to your reply. Thanks.

forwiat commented 2 years ago

Many important harmonic information is not clear from "generate" image.

zceng commented 2 years ago

180000 steps can get a passable results in my experiment. Can you provide the figure of the loss curve ?

forwiat commented 2 years ago

image image image

forwiat commented 2 years ago

Maybe it is converging?

zceng commented 2 years ago

you can compare the results of model at 100k step and 200k step, and whether the quality of generated speech were impoving.