Use with different languages

Misiu commented 9 months ago

Hi there, I want to use the steps in https://www.home-assistant.io/voice_control/create_wake_word/ to create my custom wake word. Is there a chance to use/add different languages? I'm interested in Polish, but I think that support for any additional language would be awesome.

alesms commented 7 months ago

Me too, I'm interested in Italian

synesthesiam commented 7 months ago

Models for French, German, and Dutch have just been added. It will take more time to additional languages, but fortunately the data is available: http://openslr.org/94/

fherreror commented 7 months ago

Spanish would be a great addition :D

alesms commented 7 months ago

Hi @synesthesiam do you have any news about other languages?

synesthesiam commented 7 months ago

Not yet, but Spanish, Portuguese, Polish, and Italian should be possible with the MLS dataset.

alesms commented 7 months ago

Great, but i don't know how to train the model :( maybe do you have any instructions or something?

mmalyska commented 5 months ago

@synesthesiam do you have any guidelines how to export LibriTTS-R generator from a checkpoint? I wanted to use this https://huggingface.co/datasets/rhasspy/piper-checkpoints/blob/main/pl/pl_PL/gosia/medium/epoch%3D5001-step%3D1457672.ckpt to generate samples, and also learn how to do it myself :)

alesms commented 5 months ago

Hi @synesthesiam I've tried several times over the past few days to create a new model (.pt) to use with this notebook: https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb#scrollTo=1cbqBebHXjFD to create my custom Italian wake word. I've attempted to follow various guides, including this one: https://github.com/rhasspy/piper/blob/master/TRAINING.md. I also tried starting with the 15 GB dataset you mentioned here: https://openslr.org/94/, but I haven't been successful. Could you please tell me how to do it? It would be a great help. Thank you

mario872 commented 3 months ago

I was able to export a .pt file, but not get it working with piper-sample-generator. (I'm using English US, but cannot use a model trained with piper in Piper Sample Generator) Also, I'm using WSL Ubuntu on Windows.

Here is what I've done so far: To get a .pt file I temporarily modified line 91 in https://github.com/rhasspy/piper/blob/master/src/python/piper_train/__main__.py to

torch.save(model, '/path/to/save.pt')
exit()

Then ran

python3 -m piper_train \
    --dataset-dir /path/to/training_dir/ \
    --accelerator 'cpu' \
    --devices 1 \
    --batch-size 10 \
    --validation-split 0.0 \
    --num-test-examples 0 \
    --max_epochs 10000 \
    --resume_from_checkpoint /path/to/your/last.ckpt \
    --checkpoint-epochs 1 \
    --precision 32

(To be clear, this doesn't train the model, just export it from a checkpoint (.ckpt file).) I then had a .pt file, but to get .pt.json I ran:

cp /path/to/training_dir/config.json \
   /path/to/save.pt.json

I installed PyTorch 2.0.0 and it's dependencies, piper-phonemize, and webrtcvad. By running pip freeze this is my list of installed modules. I then attempted to run Piper Sample Generator using this modified script:

import os
import sys

if "piper-sample-generator/" not in sys.path:
        sys.path.append("piper-sample-generator/")

from generate_samples import generate_samples

target_word = 'edward'

def text_to_speech(text):
    generate_samples(text = text, max_samples=1, length_scales=[1.1], noise_scales=[0.7], noise_scale_ws = [0.7], output_dir = './', batch_size=1, auto_reduce_batch_size=True, file_names=["test_generation.wav"], model='James5.pt')

text_to_speech(target_word)

This was the output:

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/home/james/new_open_wake_word_training/step-1.py", line 7, in <module>
    from generate_samples import generate_samples
  File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 14, in <module>
    import torchaudio
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/__init__.py", line 1, in <module>
    from . import kaldi
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py", line 22, in <module>
    EPSILON = torch.tensor(torch.finfo(torch.float).eps)
/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py:22: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
  EPSILON = torch.tensor(torch.finfo(torch.float).eps)
DEBUG:generate_samples:Loading James5.pt
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3mu_bb_l
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3mu_bb_l/_remote_module_non_scriptable.py
INFO:generate_samples:Successfully loaded the model
Traceback (most recent call last):
  File "/home/james/new_open_wake_word_training/step-1.py", line 14, in <module>
    text_to_speech(target_word)
  File "/home/james/new_open_wake_word_training/step-1.py", line 12, in text_to_speech
    generate_samples(text = text, max_samples=1, length_scales=[1.1], noise_scales=[0.7], noise_scale_ws = [0.7], output_dir = './', batch_size=1, auto_reduce_batch_size=True, file_names=["test_generation.wav"], model='James5.pt')
  File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 178, in generate_samples
    audio, phoneme_samples = generate_audio(
  File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 302, in generate_audio
    x, m_p_orig, logs_p_orig, x_mask = model.enc_p(x, x_lengths)
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VitsModel' object has no attribute 'enc_p'

So, I'm not really sure what to do from here, I don't really understand how AI in Python works, but I've gotten this far, help would be greatly appreciated.

Edit: If it's any help, here's what is outputted when I run print(model) at line 301 in generate_samples.py (right before it fails)

Second edit: I just realised that I'm likely getting this error because I'm using a single speaker model. Would this be likely @synesthesiam?

tolnai commented 2 months ago

Hi @synesthesiam I've tried several times over the past few days to create a new model (.pt) to use with this notebook: https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb#scrollTo=1cbqBebHXjFD to create my custom Italian wake word. I've attempted to follow various guides, including this one: https://github.com/rhasspy/piper/blob/master/TRAINING.md. I also tried starting with the 15 GB dataset you mentioned here: https://openslr.org/94/, but I haven't been successful. Could you please tell me how to do it? It would be a great help. Thank you

I was also trying to use the same notebook with referencing the German model (that's included in the release) instead of the English one, using the config from this repo, but can't get any proper human text with the sample generator, it sounds just some random phonemes...

rhasspy / piper-sample-generator

Use with different languages #4