Open ivostoykov opened 4 months ago
This caught me out for ages. Try including at least 32 wav files. I'm not sure why/where this is specified as a requirement. But I came across this in a youtube video that mentions this issue specifically.
If you want to test immediately, you can just duplicate the existing files you have till there are 32. Obviously it won't make for a great model. But you'll confirm that you can proceed with training.
EDIT: https://www.youtube.com/watch?v=ofe6IPjL8zg this is the video that mentions it. There is no visuals because the author is blind, as per their comments on the video.
Thanks, @krones9000 I'll take a look. I'm trying to generate a Bulgarian voice and I'm not sure if cloning an EN voice is a good idea at all. I'd appreciate it if you have any thoughts to share about this.
I'm afraid I'm still learning, so can't offer much advice in that regard. However, I do think you need to make sure that the language specified in the preprocessing matches the one you are trying to finetune from. So it may not allow you to finetune on and EN ckpt.
From the TRAINING.md it gives the example of specifying en-us:
python3 -m piper_train.preprocess \ --language en-us \ --input-dir /path/to/dataset_dir/ \ --output-dir /path/to/training_dir/ \ --dataset-format ljspeech \ --single-speaker \ --sample-rate 22050
I'm not sure what would happen if you specify en-us but then provide non en-us inputs. I have a feeling it will still work, but whether it works well is another question.
No, it doesn't. I specified --language bg
and I managed to put it through once before this error popped up. The result wasn't good at all but was Bulgarian after all. I suppose this is because I used only 10 wav files which is definitely not enough, but I want it to make it work first, before spending hours making more wav files.
If you want to test the pipeline you can just copy paste your existing files to make duplicates till you have 32.
Example, just using the first 3:
wav/000.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов. wav/001.wav|male1|Този запис е направен за либриво. wav/002.wav|male1|Всички записи за либривокс са обществено достояние. wav/DUP2000.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов. wav/DUP2001.wav|male1|Този запис е направен за либриво. wav/DUP2002.wav|male1|Всички записи за либривокс са обществено достояние. wav/DUP2000.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов. wav/DUP2001.wav|male1|Този запис е направен за либриво. wav/DUP2002.wav|male1|Всички записи за либривокс са обществено достояние.
etc.
At least with this you can verify the next steps work and then get better training files once you're sure it's possible to proceed.
Thanks, I'll give it a try...
same error... ;-( if this matter as it was build as per documentation
python --version
Python 3.10.12
and
ls /Projects/TTS/test/wav
10.wav 13.wav 16.wav 19.wav 21.wav 24.wav 27.wav 2.wav 32.wav 4.wav 7.wav t
11.wav 14.wav 17.wav 1.wav 22.wav 25.wav 28.wav 30.wav 33.wav 5.wav 8.wav
12.wav 15.wav 18.wav 20.wav 23.wav 26.wav 29.wav 31.wav 3.wav 6.wav 9.wav
and metadata.csv as I may have an error that I missed somehow:
wav/1.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов.
wav/2.wav|male1|Този запис е направен за либриво.
wav/3.wav|male1|Всички записи за либривокс са обществено достояние.
wav/4.wav|male1|За повече информация или за участие като доброволец,
wav/5.wav|male1|моля посетете либривокс точка орг.
wav/6.wav|male1|ЛЕВСКИ
wav/7.wav|male1|Манастирът тесен за мойта душа е.
wav/8.wav|male1|Кога човек дойде тук да се покае, трябва да забрави греховния мир,
wav/9.wav|male1|да бяга съблазни и да търси мир.
wav/10.wav|male1|Мойта съвест инак днеска ми говори.
wav/11.wav|male1|Това расо черно, що нося отгоре, не ме помирява с тия небеса
Wav/12.wav|male1|Одата Левски от цикъл Епопея на забравените на Иван Вазов.
wav/13.wav|male1|Този запис е направен за либриво.
wav/14.wav|male1|Всички записи за либривокс са обществено достояние.
wav/15.wav|male1|За повече информация или за участие като доброволец,
wav/16.wav|male1|моля посетете либривокс точка орг.
wav/17.wav|male1|ЛЕВСКИ
wav/18.wav|male1|Манастирът тесен за мойта душа е.
wav/19.wav|male1|Кога човек дойде тук да се покае, трябва да забрави греховния мир,
wav/20.wav|male1|да бяга съблазни и да търси мир.
wav/21.wav|male1|Мойта съвест инак днеска ми говори.
wav/22.wav|male1|Това расо черно, що нося отгоре, не ме помирява с тия небеса
wav/23.wav|male1|Този запис е направен за либриво.
wav/24.wav|male1|Всички записи за либривокс са обществено достояние.
wav/25.wav|male1|За повече информация или за участие като доброволец,
wav/26.wav|male1|моля посетете либривокс точка орг.
wav/27.wav|male1|ЛЕВСКИ
wav/28.wav|male1|Манастирът тесен за мойта душа е.
wav/29.wav|male1|Кога човек дойде тук да се покае, трябва да забрави греховния мир,
wav/30.wav|male1|да бяга съблазни и да търси мир.
wav/31.wav|male1|Мойта съвест инак днеска ми говори.
wav/32.wav|male1|Това расо черно, що нося отгоре, не ме помирява с тия небеса
wav/33.wav|male1|Това расо черно, що нося отгоре, не ме помирява с тия небеса
Ah, try removing the "wav/" prefix in the metadata. It is not required assuming you are giving a data directory where the csv and wav folder are both located and are providing: --dataset-format ljspeech in the preprocess.
no diff. now a csv row like like
6.wav|male1|ЛЕВСКИ
What I spotted is that the output dir has a 22050 folder created but it is empty. Perhaps my wav files are not correct. I'm using 16-bit PCM Mono 22050Hz which seems correct. Probably I should dive deeper into this to find out what is wrong.
Are you specifying single speaker?:
python3 -m piper_train.preprocess --language en-us --input-dir /path/to/dataset_dir/ --output-dir /path/to/training_dir/ --dataset-format ljspeech --single-speaker --sample-rate 22050
If so, I don't think it expects a speaker column. Sorry, just running through possible ideas now.
yes, here is the command:
python3 -m piper_train.preprocess --language bg --input-dir /Projects/TTS/test/ --output-dir /Projects/TTS/test/output/ --dataset-format ljspeech --single-speaker --sample-rate 22050
Removed speaker column, but the error persists ;-(
I'm afraid I'm all out of ideas. It could be a mismatch between the model and the training data that we're not seeing. Can you link the hugginface link for the model you're trying to finetune on?
Sorry for the late reply. I'm following the training (https://github.com/rhasspy/piper/blob/master/TRAINING.md) documentation on my local machine, not using huggingface. At I point when I have some free time I'll dig deeper. Hopefully, there will be more fresh ideas here at a point.
yeah i tried to train my own audio i had same problem. Does someone know how to fix the issue ?
I have fixed problem on csv and added 1 parameter CSV should have format 2 column delimeter : 1|bin dokuz yüz otuz bir yılının ; altı aralık cuma sabahı|bin dokuz yüz otuz bir yılının ; altı aralık cuma sabahı
Why 2 same text on one column. The first parameter 1 means it points to wavs/1.wav file 2nd parameter points to text how to pronounce word. 3rd is exact word text.
Also if you so much cpu cores.You need to add parameter --max-workers 8 so batch_size wont be 0. There is bug on batch_size
thanks @eix128. Could you explain this in more detail, please:
2nd parameter points to text how to pronounce word.
which of these is text and which one pronunciation:
bin dokuz yüz otuz bir yılının ; altı aralık cuma sabahı|bin dokuz yüz otuz bir yılının ; altı aralık cuma sabahı
And finally, doesn't the pronunciation come from the wav file?
no , for example on old english. some people say yeah , yep , yes but on ljspeech format you can say he says yeah but its yes
so: 1|yeah|yes
this is not so important for this project but some projects requires it. This project also uses ljspeech format
Hello there,
Following the instructions in Training.md those commands had been used as described in the file in the cloned piper local folder:
I got the following error executing the
preprocess
command:the metadata.csv is (the text is Cyrillic - Bulgarian):
metadata.csv is in /Projects/TTS/test/:
I'll appreciate any comments/help.
Thank you Ivo