Open Tanmaypatil123 opened 6 months ago
I just tried it, and it results to the same issue.
2024-05-12 21:24:11.820 | INFO | data_utils:_filter:64 - Init dataset...
0it [00:00, ?it/s]
[rank0]: Traceback (most recent call last):
[rank0]: File "/Workspace/leon/tts/melotts/MeloTTS/melo/train.py", line 635, in <module>
[rank0]: run()
[rank0]: File "/Workspace/leon/tts/melotts/MeloTTS/melo/train.py", line 69, in run
[rank0]: train_dataset = TextAudioSpeakerLoader(hps.data.training_files, hps.data)
[rank0]: File "/Workspace/leon/tts/melotts/MeloTTS/melo/data_utils.py", line 50, in __init__
[rank0]: self._filter()
[rank0]: File "/Workspace/leon/tts/melotts/MeloTTS/melo/data_utils.py", line 84, in _filter
[rank0]: logger.info(f'min: {min(lengths)}; max: {max(lengths)}' )
[rank0]: ValueError: min() arg is an empty sequence
E0512 21:24:14.982235 139701776379072 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 2520105) of binary: /.pyenv/versions/3.9.10/bin/python
Traceback (most recent call last):
File "/.pyenv/versions/3.9.10/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/.pyenv/versions/3.9.10/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
return f(*args, **kwargs)
File "/.pyenv/versions/3.9.10/lib/python3.9/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/.pyenv/versions/3.9.10/lib/python3.9/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/.pyenv/versions/3.9.10/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/.pyenv/versions/3.9.10/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-05-12_21:24:14
host : ...
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 2520105)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
I checked the code and found the issue.
It happens because the train.list
file is not well formatted, it is probably empty. In fact, it was my case.
This file is created during the pre-processing step via the preprocess_text.py
script.
We need to make sure that there is no error during the pre-processing step. My preprocessing actually failed but I did not see it from the output. The reason it failed is that I added EN-BR
instead of EN
for the language column. Now the preprocessing works well and the *.list
files aren't empty, so that issue disappeared.
After the former issue, I got a second one where it looped saying that No module named 'matplotlib'
. To fix it I just ran:
pip install matplotlib
I wonder if this dependency should be added as a direct dependency though.
All in all, now the training is running :smiley:
@Tanmaypatil123 hope it helps.
Thanks @louistiti I will try this ...
Hey @Tanmaypatil123 Did it work?
Hey @Tanmaypatil123 Did it work?
yes .. i was creating .list
file using my custom script which was wrong .. try to use preprocess_text.py
.
Hi @Tanmaypatil123, how did your model perform? Do you mind sharing your checkpoint for testing?
I was trying to train Melo TTS on Hindi, but i am facing following issue: Hindi is a national language in India. I am facing this issue after running .
Can anyone help me understand what I am doing wrong .. I also have audio samples with lengths more than 10 sec.
@Zengyi-Qin