Open Misiu opened 9 months ago
Me too, I'm interested in Italian
Models for French, German, and Dutch have just been added. It will take more time to additional languages, but fortunately the data is available: http://openslr.org/94/
Spanish would be a great addition :D
Hi @synesthesiam do you have any news about other languages?
Not yet, but Spanish, Portuguese, Polish, and Italian should be possible with the MLS dataset.
Great, but i don't know how to train the model :( maybe do you have any instructions or something?
@synesthesiam do you have any guidelines how to export LibriTTS-R generator from a checkpoint? I wanted to use this https://huggingface.co/datasets/rhasspy/piper-checkpoints/blob/main/pl/pl_PL/gosia/medium/epoch%3D5001-step%3D1457672.ckpt to generate samples, and also learn how to do it myself :)
Hi @synesthesiam I've tried several times over the past few days to create a new model (.pt) to use with this notebook: https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb#scrollTo=1cbqBebHXjFD to create my custom Italian wake word. I've attempted to follow various guides, including this one: https://github.com/rhasspy/piper/blob/master/TRAINING.md. I also tried starting with the 15 GB dataset you mentioned here: https://openslr.org/94/, but I haven't been successful. Could you please tell me how to do it? It would be a great help. Thank you
I was able to export a .pt file, but not get it working with piper-sample-generator. (I'm using English US, but cannot use a model trained with piper in Piper Sample Generator) Also, I'm using WSL Ubuntu on Windows.
Here is what I've done so far: To get a .pt file I temporarily modified line 91 in https://github.com/rhasspy/piper/blob/master/src/python/piper_train/__main__.py to
torch.save(model, '/path/to/save.pt')
exit()
Then ran
python3 -m piper_train \
--dataset-dir /path/to/training_dir/ \
--accelerator 'cpu' \
--devices 1 \
--batch-size 10 \
--validation-split 0.0 \
--num-test-examples 0 \
--max_epochs 10000 \
--resume_from_checkpoint /path/to/your/last.ckpt \
--checkpoint-epochs 1 \
--precision 32
(To be clear, this doesn't train the model, just export it from a checkpoint (.ckpt file).) I then had a .pt file, but to get .pt.json I ran:
cp /path/to/training_dir/config.json \
/path/to/save.pt.json
I installed PyTorch 2.0.0 and it's dependencies, piper-phonemize, and webrtcvad. By running pip freeze
this is my list of installed modules.
I then attempted to run Piper Sample Generator using this modified script:
import os
import sys
if "piper-sample-generator/" not in sys.path:
sys.path.append("piper-sample-generator/")
from generate_samples import generate_samples
target_word = 'edward'
def text_to_speech(text):
generate_samples(text = text, max_samples=1, length_scales=[1.1], noise_scales=[0.7], noise_scale_ws = [0.7], output_dir = './', batch_size=1, auto_reduce_batch_size=True, file_names=["test_generation.wav"], model='James5.pt')
text_to_speech(target_word)
This was the output:
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "/home/james/new_open_wake_word_training/step-1.py", line 7, in <module>
from generate_samples import generate_samples
File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 14, in <module>
import torchaudio
File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/__init__.py", line 1, in <module>
from torchaudio import ( # noqa: F401
File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/__init__.py", line 1, in <module>
from . import kaldi
File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py", line 22, in <module>
EPSILON = torch.tensor(torch.finfo(torch.float).eps)
/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py:22: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
EPSILON = torch.tensor(torch.finfo(torch.float).eps)
DEBUG:generate_samples:Loading James5.pt
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3mu_bb_l
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3mu_bb_l/_remote_module_non_scriptable.py
INFO:generate_samples:Successfully loaded the model
Traceback (most recent call last):
File "/home/james/new_open_wake_word_training/step-1.py", line 14, in <module>
text_to_speech(target_word)
File "/home/james/new_open_wake_word_training/step-1.py", line 12, in text_to_speech
generate_samples(text = text, max_samples=1, length_scales=[1.1], noise_scales=[0.7], noise_scale_ws = [0.7], output_dir = './', batch_size=1, auto_reduce_batch_size=True, file_names=["test_generation.wav"], model='James5.pt')
File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 178, in generate_samples
audio, phoneme_samples = generate_audio(
File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 302, in generate_audio
x, m_p_orig, logs_p_orig, x_mask = model.enc_p(x, x_lengths)
File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VitsModel' object has no attribute 'enc_p'
So, I'm not really sure what to do from here, I don't really understand how AI in Python works, but I've gotten this far, help would be greatly appreciated.
Edit: If it's any help, here's what is outputted when I run print(model)
at line 301 in generate_samples.py (right before it fails)
Second edit: I just realised that I'm likely getting this error because I'm using a single speaker model. Would this be likely @synesthesiam?
Hi @synesthesiam I've tried several times over the past few days to create a new model (.pt) to use with this notebook: https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb#scrollTo=1cbqBebHXjFD to create my custom Italian wake word. I've attempted to follow various guides, including this one: https://github.com/rhasspy/piper/blob/master/TRAINING.md. I also tried starting with the 15 GB dataset you mentioned here: https://openslr.org/94/, but I haven't been successful. Could you please tell me how to do it? It would be a great help. Thank you
I was also trying to use the same notebook with referencing the German model (that's included in the release) instead of the English one, using the config from this repo, but can't get any proper human text with the sample generator, it sounds just some random phonemes...
Hi there, I want to use the steps in https://www.home-assistant.io/voice_control/create_wake_word/ to create my custom wake word. Is there a chance to use/add different languages? I'm interested in Polish, but I think that support for any additional language would be awesome.