rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
6.28k stars 461 forks source link

Error: KeyError when preprocessing a dataset #59

Closed pluja closed 1 year ago

pluja commented 1 year ago

I am trying to preprocess a dataset with the following command:

piper/src/python on master [!] via 🐍 v3.10.6 (.venv) 
❯ python3 -m piper_train.preprocess \
        --language en \
        --input-dir /home/user/PROJECTS/VoiceDataset/audio_files/45d97e58-b709-4d3b-8dd5-55213d4401c2 \
        --output-dir /home/user/PROJECTS/VoiceDataset/training/ \
        --dataset-format mycroft \
        --sample-rate 22050

And I'm getting this error:

INFO:preprocess:Single speaker dataset
INFO:preprocess:Wrote dataset config
INFO:preprocess:Processing 344 utterance(s) with 16 worker(s)
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/user/PROJECTS/VoiceDatasetGenerator/piper/src/python/piper_train/preprocess.py", line 331, in <module>
    main()
  File "/home/user/PROJECTS/VoiceDatasetGenerator/piper/src/python/piper_train/preprocess.py", line 167, in main
    utt.speaker_id = speaker_ids[utt.speaker]
KeyError: '45d97e58-b709-4d3b-8dd5-55213d4401c2'

What is this error about? What I'm doing wrong?

My dataset folder looks like this:

❯ ls -al /home/user/PROJECTS/VoiceDatasetGenerator/audio_files/45d97e58-b709-4d3b-8dd5-55213d4401c2 
.rw-rw-r-- 539k whoami  1 May 20:13 0a8c3628-8fd4-4122-8a51-cafc4c948d1f.wav
.rw-rw-r-- 776k whoami  1 May 20:13 0bf4b8c0-6eae-42d3-996a-2a6105290e51.wav
[350 files more]
.rw-rw-r--  42k whoami  1 May 20:30 45d97e58-b709-4d3b-8dd5-55213d4401c2-metadata.txt
pluja commented 1 year ago

Fixed: