rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
4.57k stars 315 forks source link

ValueError: n must be at least one #368

Open ivostoykov opened 4 months ago

ivostoykov commented 4 months ago

Hello there,

Following the instructions in Training.md those commands had been used as described in the file in the cloned piper local folder:

cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools
pip3 install -e .

I got the following error executing the preprocess command:

python3 -m piper_train.preprocess --language bg --input-dir /Projects/TTS/test/ --output-dir /Projects/TTS/test/output/ --dataset-format ljspeech --single-speaker --sample-rate 22050
INFO:preprocess:Single speaker dataset
INFO:preprocess:Wrote dataset config
INFO:preprocess:Processing 11 utterance(s) with 20 worker(s)
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Projects/Python/piper/src/python/piper_train/preprocess.py", line 502, in <module>
    main()
  File "/Projects/Python/piper/src/python/piper_train/preprocess.py", line 225, in main
    for utt_batch in batched(
  File "/Projects/Python/piper/src/python/piper_train/preprocess.py", line 491, in batched
    raise ValueError("n must be at least one")

the metadata.csv is (the text is Cyrillic - Bulgarian):

wav/000.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов.
wav/001.wav|male1|Този запис е направен за либриво.
wav/002.wav|male1|Всички записи за либривокс са обществено достояние.
wav/003.wav|male1|За повече информация или за участие като доброволец,
wav/004.wav|male1|моля посетете либривокс точка орг.
wav/005.wav|male1|ЛЕВСКИ
wav/006.wav|male1|Манастирът тесен за мойта душа е.
wav/007.wav|male1|Кога човек дойде тук да се покае, трябва да забрави греховния мир,
wav/008.wav|male1|да бяга съблазни и да търси мир.
wav/009.wav|male1|Мойта съвест инак днеска ми говори.
wav/010.wav|male1|Това расо черно, що нося отгоре, не ме помирява с тия небеса

metadata.csv is in /Projects/TTS/test/:

ls /Projects/TTS/test/
metadata.csv  output  wav

I'll appreciate any comments/help.

Thank you Ivo

krones9000 commented 4 months ago

This caught me out for ages. Try including at least 32 wav files. I'm not sure why/where this is specified as a requirement. But I came across this in a youtube video that mentions this issue specifically.

If you want to test immediately, you can just duplicate the existing files you have till there are 32. Obviously it won't make for a great model. But you'll confirm that you can proceed with training.

EDIT: https://www.youtube.com/watch?v=ofe6IPjL8zg this is the video that mentions it. There is no visuals because the author is blind, as per their comments on the video.

ivostoykov commented 4 months ago

Thanks, @krones9000 I'll take a look. I'm trying to generate a Bulgarian voice and I'm not sure if cloning an EN voice is a good idea at all. I'd appreciate it if you have any thoughts to share about this.

krones9000 commented 4 months ago

I'm afraid I'm still learning, so can't offer much advice in that regard. However, I do think you need to make sure that the language specified in the preprocessing matches the one you are trying to finetune from. So it may not allow you to finetune on and EN ckpt.

From the TRAINING.md it gives the example of specifying en-us:

python3 -m piper_train.preprocess \ --language en-us \ --input-dir /path/to/dataset_dir/ \ --output-dir /path/to/training_dir/ \ --dataset-format ljspeech \ --single-speaker \ --sample-rate 22050

I'm not sure what would happen if you specify en-us but then provide non en-us inputs. I have a feeling it will still work, but whether it works well is another question.

ivostoykov commented 4 months ago

No, it doesn't. I specified --language bg and I managed to put it through once before this error popped up. The result wasn't good at all but was Bulgarian after all. I suppose this is because I used only 10 wav files which is definitely not enough, but I want it to make it work first, before spending hours making more wav files.

krones9000 commented 4 months ago

If you want to test the pipeline you can just copy paste your existing files to make duplicates till you have 32.

Example, just using the first 3:

wav/000.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов. wav/001.wav|male1|Този запис е направен за либриво. wav/002.wav|male1|Всички записи за либривокс са обществено достояние. wav/DUP2000.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов. wav/DUP2001.wav|male1|Този запис е направен за либриво. wav/DUP2002.wav|male1|Всички записи за либривокс са обществено достояние. wav/DUP2000.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов. wav/DUP2001.wav|male1|Този запис е направен за либриво. wav/DUP2002.wav|male1|Всички записи за либривокс са обществено достояние.

etc.

At least with this you can verify the next steps work and then get better training files once you're sure it's possible to proceed.

ivostoykov commented 4 months ago

Thanks, I'll give it a try...

ivostoykov commented 4 months ago

same error... ;-( if this matter as it was build as per documentation

python --version
Python 3.10.12

and

ls /Projects/TTS/test/wav
10.wav  13.wav  16.wav  19.wav  21.wav  24.wav  27.wav  2.wav   32.wav  4.wav  7.wav  t
11.wav  14.wav  17.wav  1.wav   22.wav  25.wav  28.wav  30.wav  33.wav  5.wav  8.wav
12.wav  15.wav  18.wav  20.wav  23.wav  26.wav  29.wav  31.wav  3.wav   6.wav  9.wav

and metadata.csv as I may have an error that I missed somehow:

wav/1.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов.
wav/2.wav|male1|Този запис е направен за либриво.
wav/3.wav|male1|Всички записи за либривокс са обществено достояние.
wav/4.wav|male1|За повече информация или за участие като доброволец,
wav/5.wav|male1|моля посетете либривокс точка орг.
wav/6.wav|male1|ЛЕВСКИ
wav/7.wav|male1|Манастирът тесен за мойта душа е.
wav/8.wav|male1|Кога човек дойде тук да се покае, трябва да забрави греховния мир,
wav/9.wav|male1|да бяга съблазни и да търси мир.
wav/10.wav|male1|Мойта съвест инак днеска ми говори.
wav/11.wav|male1|Това расо черно, що нося отгоре, не ме помирява с тия небеса
Wav/12.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов.
wav/13.wav|male1|Този запис е направен за либриво.
wav/14.wav|male1|Всички записи за либривокс са обществено достояние.
wav/15.wav|male1|За повече информация или за участие като доброволец,
wav/16.wav|male1|моля посетете либривокс точка орг.
wav/17.wav|male1|ЛЕВСКИ
wav/18.wav|male1|Манастирът тесен за мойта душа е.
wav/19.wav|male1|Кога човек дойде тук да се покае, трябва да забрави греховния мир,
wav/20.wav|male1|да бяга съблазни и да търси мир.
wav/21.wav|male1|Мойта съвест инак днеска ми говори.
wav/22.wav|male1|Това расо черно, що нося отгоре, не ме помирява с тия небеса
wav/23.wav|male1|Този запис е направен за либриво.
wav/24.wav|male1|Всички записи за либривокс са обществено достояние.
wav/25.wav|male1|За повече информация или за участие като доброволец,
wav/26.wav|male1|моля посетете либривокс точка орг.
wav/27.wav|male1|ЛЕВСКИ
wav/28.wav|male1|Манастирът тесен за мойта душа е.
wav/29.wav|male1|Кога човек дойде тук да се покае, трябва да забрави греховния мир,
wav/30.wav|male1|да бяга съблазни и да търси мир.
wav/31.wav|male1|Мойта съвест инак днеска ми говори.
wav/32.wav|male1|Това расо черно, що нося отгоре, не ме помирява с тия небеса
wav/33.wav|male1|Това расо черно, що нося отгоре, не ме помирява с тия небеса
krones9000 commented 4 months ago

Ah, try removing the "wav/" prefix in the metadata. It is not required assuming you are giving a data directory where the csv and wav folder are both located and are providing: --dataset-format ljspeech in the preprocess.

ivostoykov commented 4 months ago

no diff. now a csv row like like 6.wav|male1|ЛЕВСКИ

What I spotted is that the output dir has a 22050 folder created but it is empty. Perhaps my wav files are not correct. I'm using 16-bit PCM Mono 22050Hz which seems correct. Probably I should dive deeper into this to find out what is wrong. Screenshot from 2024-02-01 11-53-07

krones9000 commented 4 months ago

Are you specifying single speaker?:

python3 -m piper_train.preprocess --language en-us --input-dir /path/to/dataset_dir/ --output-dir /path/to/training_dir/ --dataset-format ljspeech --single-speaker --sample-rate 22050

If so, I don't think it expects a speaker column. Sorry, just running through possible ideas now.

ivostoykov commented 4 months ago

yes, here is the command:

python3 -m piper_train.preprocess --language bg --input-dir /Projects/TTS/test/ --output-dir /Projects/TTS/test/output/ --dataset-format ljspeech --single-speaker --sample-rate 22050

Removed speaker column, but the error persists ;-(

krones9000 commented 4 months ago

I'm afraid I'm all out of ideas. It could be a mismatch between the model and the training data that we're not seeing. Can you link the hugginface link for the model you're trying to finetune on?

ivostoykov commented 4 months ago

Sorry for the late reply. I'm following the training (https://github.com/rhasspy/piper/blob/master/TRAINING.md) documentation on my local machine, not using huggingface. At I point when I have some free time I'll dig deeper. Hopefully, there will be more fresh ideas here at a point.

eix128 commented 3 months ago

yeah i tried to train my own audio i had same problem. Does someone know how to fix the issue ?

eix128 commented 3 months ago

I have fixed problem on csv and added 1 parameter CSV should have format 2 column delimeter : 1|bin dokuz yüz otuz bir yılının ; altı aralık cuma sabahı|bin dokuz yüz otuz bir yılının ; altı aralık cuma sabahı

Why 2 same text on one column. The first parameter 1 means it points to wavs/1.wav file 2nd parameter points to text how to pronounce word. 3rd is exact word text.

Also if you so much cpu cores.You need to add parameter --max-workers 8 so batch_size wont be 0. There is bug on batch_size

ivostoykov commented 3 months ago

thanks @eix128. Could you explain this in more detail, please:

2nd parameter points to text how to pronounce word.

which of these is text and which one pronunciation:

bin dokuz yüz otuz bir yılının ; altı aralık cuma sabahı|bin dokuz yüz otuz bir yılının ; altı aralık cuma sabahı

And finally, doesn't the pronunciation come from the wav file?

eix128 commented 3 months ago

no , for example on old english. some people say yeah , yep , yes but on ljspeech format you can say he says yeah but its yes

so: 1|yeah|yes

this is not so important for this project but some projects requires it. This project also uses ljspeech format