❓ How to import TTS models in C++

jeannekamikaze commented 1 year ago

I am trying to import the v3_en model into C++. I have downloaded the model using a Python script with:

model, example_text = torch.hub.load(
        repo_or_dir='snakers4/silero-models',
        model='silero_tts',
        language='en',
        speaker='v3_en')

This caches the model in the path:

~/.cache/torch/hub/snakers4_silero-models_master/src/silero/model/v3_en.pt

If in my C++ program I run torch::jit::load() on the cached model path, I get the error:

PytorchStreamReader failed locating file constants.pkl: file not found

A quick search suggests that the error occurs because the model has not been converted to TorchScript (though I see a constants.pkl file in the v3_en.pt file, which is just a zip.) I then try to convert to TorchScript back in Python via:

traced = torch.jit.trace(model, example_text)

But this results in the error:

File "~/.python/lib/python3.11/site-packages/torch/_jit_internal.py", line 1152, in _qualified_name
    raise RuntimeError("Could not get name of python class object")

If you have imported a Silero TTS model into C++ before or know how to get this to work, any guidance would be appreciated. Thank you.

jeannekamikaze commented 1 year ago

Answering my own question: there is a model member in the model returned by torch.hub.load(). That is the actual model that needs to be saved and then loaded by torch::jit::load() (Silero models are apparently already jitted, and re-jitting results in a no-op.)

AlexMaxy commented 1 month ago

Answering my own question: there is a model member in the model returned by torch.hub.load(). That is the actual model that needs to be saved and then loaded by torch::jit::load() (Silero models are apparently already jitted, and re-jitting results in a no-op.)

Hello. Did you manage to import this model into c++? Is it even possible to do it?

jeannekamikaze commented 1 month ago

Yeah, it's possible, you need to replicate the non-model code in the python file that massages the input into the vector format that the model consumes. I don't have the code anymore, though, as I abandoned this due to the non-free license.

AlexMaxy commented 1 month ago

Yeah, it's possible, you need to replicate the non-model code in the python file that massages the input into the vector format that the model consumes. I don't have the code anymore, though, as I abandoned this due to the non-free license.

Thank you very much. That's answer enough for me that it can be done.

jeannekamikaze commented 1 month ago

Update: I found the stuff I had written and uploaded it here: https://filetransfer.io/data-package/HDNdnkp1#link

I haven't run this shit in a long while, but this is roughly what's in the zip:

ttsmod.py -- Run with some combination of --export and --lut to export the Silero model in a format that the C++ Torch library understands and also generate a C++ header with a LUT that translates ASCII to the symbols the model understands.
src/tts/ -- A C++ library that consumes the generated header above and wraps the whole inference shenanigan.
src/tts-bin/ -- A demo program that uses the library to synthesize sample text and write the result to a wave file. This consumes the model you --exported with the Python script above.

All that being said, there are better models out there, both in terms of ease of use and licensing. I remember experimenting with ONNX models and having a much easier time, if you can put up with Microsoft bait-and-switch software ecosystems, though I don't remember what model was it that I tried.

AlexMaxy commented 1 month ago

Update: I found the stuff I had written and uploaded it here

Wow, that's cool! I'll try to figure it out.

snakers4 / silero-models

❓ How to import TTS models in C++ #254