myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
MIT License
4.81k stars 626 forks source link

report "\ufeff..." errors when running train.sh #148

Open joaoleino opened 5 months ago

joaoleino commented 5 months ago

Hello,

I saved all the files - config.json, metadata.list as UTF-8 without BOM format, while when running the training bash
bash train.sh ./data/example/config.json 1

it always report the

rank0: FileNotFoundError: [Errno 2] No such file or directory: '\ufeffdata/example/audio_for_training/aud2_2_0.wav'

I referred the metadata.list format and created my own metadata.list as below data/example/audio_for_training/aud2_2_0.wav data/example/audio_for_training/aud2_3_0.wav data/example/audio_for_training/aud2_4_0.wav ....

Details:

3%|██▌ | 1/29 00:00<00:00, 430.67it/s: Traceback (most recent call last): rank0: File "/home/tom/melotts/MeloTTS/melo/train.py", line 636, in

rank0: File "/home/tom/melotts/MeloTTS/melo/train.py", line 69, in run rank0: train_dataset = TextAudioSpeakerLoader(hps.data.training_files, hps.data) rank0: File "/home/tom/melotts/MeloTTS/melo/data_utils.py", line 50, in init

rank0: File "/home/tom/melotts/MeloTTS/melo/data_utils.py", line 81, in _filter rank0: lengths.append(os.path.getsize(audiopath) // (2 * self.hop_length)) rank0: File "/usr/lib/python3.10/genericpath.py", line 50, in getsize rank0: return os.stat(filename).st_size rank0: FileNotFoundError: [Errno 2] No such file or directory: '\ufeffdata/example/audio_for_training/aud2_2_0.wav' ^CW0612 07:21:52.457000 140061947457536 torch/distributed/elastic/agent/server/api.py:741] Received Signals.SIGINT death signal, shutting down workers W0612 07:21:52.458000 140061947457536 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 803 closing signal SIGINT

joaoleino commented 5 months ago

My env is Ubuntu 22.04 + python 3.10

joaoleino commented 5 months ago

Anyone can help me? Thanks !

RedBluePrinter commented 5 months ago

According to compart.com/en/unicode/U+FEFF (Unicode/Invisible Character). It could be that your metadata.list has an invalid encoding. try to encode it to UTF-8. Try to encode your file to utf16 and back to utf-8 in your IDE/Code editor!

RedBluePrinter commented 5 months ago

image

RedBluePrinter commented 5 months ago

16 and back to 8!

RedBluePrinter commented 5 months ago

Did you Complete " python preprocess_text.py --metadata data/example/metadata.list "?