Long sentences - Githubissues

lmnt-com / diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Apache License 2.0

767 stars 112 forks source link

Hi,

the model seems to be working fairly well (tested after just 100K steps on a 100 speaker 24KHz dataset, it starts sounding reasonably well, but I guess it needs more epochs to achieve higher quality).

I just tested it on some random sentences, and I noticed the GPU ran out of memory for long sentences. What would be the best approach to synthesize long sentences? The baseline would be to split the mel spectrogram in parts and synthesize them separately, but I am not sure if this is the only way to go.

Thank you for your help!

PD: I'll report some results after 1M steps.

lmnt-com / diffwave

Long sentences #8