xiph / LPCNet

Efficient neural speech synthesis
BSD 3-Clause "New" or "Revised" License
1.12k stars 295 forks source link

LPCNET with pitch higher than 500 Hz #175

Open juliakorovsky opened 2 years ago

juliakorovsky commented 2 years ago

@jmvalin Let's say I want to use Lpcnet with a TTS network that was trained on female voice samples with maximum pitch not 500, but 550 Hz. Should I change some Lpcnet parameters to be able to get a good quality? For now I'm getting artifacts where some high-frequency sounds just disappear and it causes "croaking" effect.

jmvalin commented 2 years ago

At the moment, the highest pitch that can be detected is 500 Hz. That corresponds to the shortest period allowed set by PITCH_MIN_PERIOD, which is currently 32. At the very least you'd have to decrease that value if you want support for higher pitches. Considering that I've never tried changing PITCH_MIN_PERIOD, it's also possible you may run into new issues (or not), but I'd recommend not decreasing it too much (I'd say don't go below 20 because you might get interactions with the LPC part).