jaywalnut310 / glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search
MIT License
660 stars 151 forks source link

[train.py] epoch당 진행률 계산 #4

Open Sejik opened 4 years ago

Sejik commented 4 years ago

빠른 TTS를 위한 좋은 논문을 내주셔서 감사합니다.

제목에 해당하는 부분을 먼저 말씀드리자면, (944525a commit) train.py의 127번째 line 에서 logger.info에서 진행도를 계산하는 부분에서 gpu 개수가 고려가 되어 있지 않습니다.

해당 부분: logger.info('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch, batch_idx * len(x), len(train_loader.dataset)

그 외에 화자 7명에 대해 training을 진행하기 위해, n_speakers와 gin_channels를 조절해서 input을 집어넣어서 training을 진행하지만, training에 어려움이 있습니다. (지속된 gradient overflow로 인해 training이 되지 않습니다.) 혹시 training과 관련해서 조언을 주실 수 있으시면 감사하겠습니다.

감사합니다.

jaywalnut310 commented 4 years ago

Though my english is poor, I'll anwer in english for other people.

Yes, the 127th line of train.py doesn't consider the number of gpus, which may cause misunderstanding about training progress in multi gpus training. Thanks for letting me know the misimplementation! I'll fix the problem soon.

You may have noticed, I did not upload code for multi-speaker setting for several reasons. You have to use TextMelSpeakerLoader instead of TextMelLoader in data_utils.py. It's because TextMelLoader does not return speaker identities. That would cause the instable training process on your multi-speaker dataset.

To use TextMelSpeakerLoader, you have to make '|' seperated filelist with speaker info. One line example of such file is as: DUMMY/Jay_001.wav|4|Mrs. De Mohrenschildt thought that Oswald,

Adding n_speakers and gin_channels in the config is good.

I'll inform you when I upload code for multi-speaker setting.

Sejik commented 4 years ago

Thank you for your kindly and prompt reply.