marty1885 / paroli

Streaming TTS based on Piper with optional RK3588 NPU support
MIT License
34 stars 9 forks source link

Not able to find encoder.onnx and decoder.onnx #3

Open patel-lyzr opened 4 months ago

patel-lyzr commented 4 months ago

@marty1885 Where to find encoder.onnx and decoder.onnx?

marty1885 commented 4 months ago

Please read the README. The file is on HuggingFace https://huggingface.co/marty1885/streaming-piper/tree/main/ljspeech

patel-lyzr commented 3 months ago

@marty1885 Thank you for the prompt reply, I would like know to how to convert Other Piper models to encoder and decoder, or is there any way to use this piper models directly. to try other Voices, I read the readme. it has the link to training code. and the piper lib.

marty1885 commented 3 months ago

@patel-lyzr Please refer to the obtaining model section of the README document.

Quote:

to convert checkpoints into ONNX file pairs, you'll need mush42's piper fork and the streaming branch. Run

python3 -m piper_train.export_onnx_streaming /path/to/your/traning/lighting_logs/version_0/checkpoints/blablablas.ckpt /path/to/output/directory

The branch has been merged into Piper so the latest version of piper should work just fine. But you NEED the checkpoint file generated during the training process. That's what generates the ONNX models. You cannot convert Piper's ONNX models into streaming format (not without major effort and some deep ONNX hacking at least).

If you don't have that - unfortunately you have to train one from scratch. It's not that hard though. I trained the demo LJSpeech one over a single weekend with a single 3090. The training process is also well documented in Piper's documents.

Or, just ask the model's creator and see if they can release the checkpoint.