p0p4k / vits2_pytorch

unofficial vits2-TTS implementation in pytorch
https://arxiv.org/abs/2307.16430
MIT License
471 stars 84 forks source link

Failed to do inferencing with the provided checkpoint and inferencing code #24

Closed Subarasheese closed 1 year ago

Subarasheese commented 1 year ago

Greetings,

Unfortunately this project only has a Colab for inferencing (the inference.ipynb file) . I had to adapt it into a "local" script, made some changes to save wav files etc.

The script runs fine, however, when saving the .wav files, they are all mute (despite not being 0kb). I even opened Audacity to check it, and the wave file indeed has nothing on it.

Can the author or someone else provide code that works with the provided checkpoint, running outside of a Colab environment? Thank you

p0p4k commented 1 year ago

Hi, the inference.ipynb is supposed to be a standalone notebook and can be adopted to colab with some changes. The colab notebook is inside notebooks folder for trainings. Can you maybe explore the wav data using numpy, and play it inside the notebook before saving? What are you using to save the files?

Subarasheese commented 1 year ago

Hi @p0p4k , I am saving the audio like this:


from scipy.io.wavfile import write

audio_path = "./output_audio.wav"
write(audio_path, hps.data.sampling_rate, audio.astype('int16'))
print(f"Audio saved to {audio_path}")

But it is writing as if the contents of "audio" is empty.

I would really appreciate if you could get this running outside of the Notebook/Colab environment, as a regular .py script =) Thank you

p0p4k commented 1 year ago

@Subarasheese I removed all the notebooks and now this repo contains inference.py files. Also, can you double check the contents of audio array before saving?

Subarasheese commented 1 year ago

@p0p4k Great, it worked, thank you!

I suspect there was a problem on my side during my first attempt: I was not using an appropriate build inside monotonic_align, so at the time i did some workarounds that could have resulted in an empty audio array.

Just for documentation sake, I needed extra steps:

inside monotonic_align, I had to:

mkdir -p build/lib.linux-x86_64-cpython-310/monotonic_align
python setup.py build_ext --inplace

then (because it failed to move the files for some reason)

cp build/lib.linux-x86_64-cpython-310/monotonic_align/core.cpython-310-x86_64-linux-gnu.so monotonic_align/