ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
MIT License
1.81k stars 531 forks source link

FastSpeechs training error #13

Open mataym opened 4 years ago

mataym commented 4 years ago

In this case, after generated textgrid files by MFA and placed in the preprocessed folder, i ran the scripts preprocess.py, prepare_align.py and preprocess.py sperataly and no error occured, and created these file: alignment energy f0 mel stat.txt train.txt val.txt; then i ran the python train.py script to train the model, but get the error as follows:

5 15] (50,) 00001943 [] (0,) [26 15 12 6 4 4 10 11 5 3 2 3 2 5 7 7 8 8 3 3 7 9 5 4 5 8 15 6 7 5 8 9 3 4 5 10 12 4 6 4 6 9 4 9 5 12] (46,) 00000539 Traceback (most recent call last): File "train.py", line 238, in main(args) File "train.py", line 108, in main text, src_len, mel_len, D, f0, energy, max_src_len, max_mel_len) File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, kwargs) File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward return self.module(*inputs[0], *kwargs[0]) File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/home/speechlab/temp/fs2p/fastspeech2.py", line 36, in forward encoder_output, src_mask, mel_mask, d_target, p_target, e_target, max_mel_len) File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/speechlab/temp/fs2p/modules.py", line 46, in forward pitch_embedding = self.pitch_embedding(torch.bucketize(pitch_target, self.pitch_bins)) RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/autograd/functions/utils.h":59, please report a bug to PyTorch. (fs2p) [speechlab@localhost fs2p]$ RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/autograd/functions/utils.h":59, please report a bug to PyTorch.

in this case, i install pytorch 1.6 stable edition as offical pytorch site. but not torch_nightly pip3 install --pre torch==1.6.0.dev20200428 -f https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html

how do i fix this problem? thanks advanced!

mataym commented 4 years ago

to above error, i change the pytorch version to solve it , but now training model occurred another error:

3 5 5 3 6 3 6 3 10 4 8 3 6 4 4 5 4 15] (66,) 00000658 [] (0,) [16 19 9 5 5 11 5 4 16 10 8 2 6 5 7 7 7 4 4 6 9 5 9 5 6 4 6 4 5 5 5 4 6 16 9 5 5 6 7 10 4 4 3 5 6 3 3 4 4 14 8 7 5 3 5 11 5 7 2 10 11 8 8 13 9] (65,) 00000749 Traceback (most recent call last): File "train.py", line 238, in main(args) File "train.py", line 112, in main log_duration_output, log_D, f0_output, f0, energy_output, energy, mel_output, mel_postnet_output, mel_target, ~src_mask, ~mel_mask) File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 562, in call result = self.forward(*input, **kwargs) File "/home/speechlab/temp/fs2p/loss.py", line 20, in forward log_d_target = log_d_target.masked_select(src_mask) RuntimeError: The size of tensor a (8) must match the size of tensor b (131) at non-singleton dimension 1 can anyone give me a help how to solve this problem?

Curry-AI commented 4 years ago

i have the same problem

ming024 commented 4 years ago

@mataym @Mao-JianGuo I am sorry that I am now busy with another project. I will check this problem several days later.

Curry-AI commented 4 years ago

i found a solution In modules.py change self.pitch_bins = nn.Parameter(torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1))) self.energy_bins = nn.Parameter(torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1))

to self.pitch_bins = torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1)).cuda() self.energy_bins = torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1).cuda()

mataym commented 4 years ago

i found a solution In modules.py change self.pitch_bins = nn.Parameter(torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1))) self.energy_bins = nn.Parameter(torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1))

to self.pitch_bins = torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1)).cuda() self.energy_bins = torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1).cuda()

after i changed the code as ur said, the error is still occurred!

Curry-AI commented 4 years ago

i found a solution In modules.py change self.pitch_bins = nn.Parameter(torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1))) self.energy_bins = nn.Parameter(torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1)) to self.pitch_bins = torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1)).cuda() self.energy_bins = torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1).cuda()

after i changed the code as ur said, the error is still occurred!

you need to reinstall the lastest pytroch

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch or pip install torch torchvision

mataym commented 4 years ago

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

my cuda version is 10.1,pytorch==1.6.0.dev20200428+cu101 , is it necessary to install cuda 10.2?

Curry-AI commented 4 years ago

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

my cuda version is 10.1,pytorch==1.6.0.dev20200428+cu101 , is it necessary to install cuda 10.2? I sucessfully run the project in cuda version10.2

I'm not sure, but you can try installing conda install Python torch vision cudatatoolkit = 10.1 - C python

mataym commented 4 years ago

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

my cuda version is 10.1,pytorch==1.6.0.dev20200428+cu101 , is it necessary to install cuda 10.2? I sucessfully run the project in cuda version10.2

I'm not sure, but you can try installing conda install Python torch vision cudatatoolkit = 10.1 - C python

after i upgraded latest pytorch, the same error is still occurring. Traceback (most recent call last): File "train.py", line 238, in main(args) File "train.py", line 112, in main log_duration_output, log_D, f0_output, f0, energy_output, energy, mel_output, mel_postnet_output, mel_target, ~src_mask, ~mel_mask) File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/speechlab/temp/fs2p/loss.py", line 20, in forward log_d_target = log_d_target.masked_select(src_mask) RuntimeError: The size of tensor a (7) must match the size of tensor b (112) at non-singleton dimension 1 any idea for this problem?

Curry-AI commented 4 years ago

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

my cuda version is 10.1,pytorch==1.6.0.dev20200428+cu101 , is it necessary to install cuda 10.2? I sucessfully run the project in cuda version10.2

I'm not sure, but you can try installing conda install Python torch vision cudatatoolkit = 10.1 - C python

after i upgraded latest pytorch, the same error is still occurring. Traceback (most recent call last): File "train.py", line 238, in main(args) File "train.py", line 112, in main log_duration_output, log_D, f0_output, f0, energy_output, energy, mel_output, mel_postnet_output, mel_target, ~src_mask, ~mel_mask) File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/speechlab/temp/fs2p/loss.py", line 20, in forward log_d_target = log_d_target.masked_select(src_mask) RuntimeError: The size of tensor a (7) must match the size of tensor b (112) at non-singleton dimension 1 any idea for this problem?

Obviously, it's not the same error, I haven't encountered it, I can't help you

kairosdojo commented 4 years ago

to above error, i change the pytorch version to solve it , but now training model occurred another error:

3 5 5 3 6 3 6 3 10 4 8 3 6 4 4 5 4 15] (66,) 00000658 [] (0,) [16 19 9 5 5 11 5 4 16 10 8 2 6 5 7 7 7 4 4 6 9 5 9 5 6 4 6 4 5 5 5 4 6 16 9 5 5 6 7 10 4 4 3 5 6 3 3 4 4 14 8 7 5 3 5 11 5 7 2 10 11 8 8 13 9] (65,) 00000749 Traceback (most recent call last): File "train.py", line 238, in main(args) File "train.py", line 112, in main log_duration_output, log_D, f0_output, f0, energy_output, energy, mel_output, mel_postnet_output, mel_target, ~src_mask, ~mel_mask) File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 562, in call result = self.forward(*input, **kwargs) File "/home/speechlab/temp/fs2p/loss.py", line 20, in forward log_d_target = log_d_target.masked_select(src_mask) RuntimeError: The size of tensor a (8) must match the size of tensor b (131) at non-singleton dimension 1 can anyone give me a help how to solve this problem?

Hi!

I'm not 100% sure that @Mao-JianGuo suggestions fix your original problem (as they didn't fix it for me)... because I suspect it might have to do with our datasets alignment using MFA.

Did you use a custom dataset? Did you use MFA's train and align utility with a custom lexicon corpus / acoustic model?

You see, when I train and align my own custom dataset (with my own g2p trained dictionary as there isn't around a pretrained one I can use for my language) I get this warning (MFA version 1.0.1 compiled along with Kaldi from scratch):

FastSpeech2 git:(master) ✗ mfa_train_and_align myCorpus.lab preprocessed/myDataset/TextGrid/ --output_model_path myAcustModel -c -j8
aligner/command_line/train_and_align.py:30: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information...
Creating dictionary information...
Setting up training data...

**There were words not found in the dictionary. Would you like to abort to fix them? (Y/N)**

I imagine it has to do with this because those arrays / tuples you see to start with (before the error traceback) are printed during the data loading phase, in https://github.com/ming024/FastSpeech2/blob/master/dataset.py#L56:

for text, D, id_ in zip(texts, Ds, ids):
    if len(text) != len(D):
        print(text, text.shape, D, D.shape, id_)

I'm getting the same error and the fact sometimes is different (and looks like a different error) might have to do with batch shuffling.

ming024 commented 4 years ago

I think this problem arises from false alignment with MFA. As @kairosdojo said, the duration arrays of some sentences are empty.

kairosdojo commented 4 years ago

Hi, everyone! I was able to solve my case so I'd like to share what I've found with you, as someone else might find this useful (notice well: this isn't valid just for this FastSpeech2 implementation but most likely for other ones too).

  1. There were errors in my MFA alignment due mostly to my own dataset containing writing errors and words not found in the dictionary. Fixing that is a pain but with patience, time and combining quick check scripts and manual edits I was able to do that.

  2. I recommend to apply both patches by @Mao-JianGuo described here as I've encountered the errors they fix later.

  3. This one was weirdly the most difficult (yet so obvious!) to discover: the phonetic alphabet. You see, in https://github.com/ming024/FastSpeech2/blob/master/text/cmudict.py#L6 it is specified the ARPA phonetic alphabet used in cmudict for the english language:

valid_symbols = [
    'AA', 'AA0', 'AA1', 'AA2', 'AE', 'AE0', 'AE1', 'AE2', 'AH', 'AH0', 'AH1', 'AH2',
    'AO', 'AO0', 'AO1', 'AO2', 'AW', 'AW0', 'AW1', 'AW2', 'AY', 'AY0', 'AY1', 'AY2',
    'B', 'CH', 'D', 'DH', 'EH', 'EH0', 'EH1', 'EH2', 'ER', 'ER0', 'ER1', 'ER2', 'EY',
    'EY0', 'EY1', 'EY2', 'F', 'G', 'HH', 'IH', 'IH0', 'IH1', 'IH2', 'IY', 'IY0', 'IY1',
    'IY2', 'JH', 'K', 'L', 'M', 'N', 'NG', 'OW', 'OW0', 'OW1', 'OW2', 'OY', 'OY0',
    'OY1', 'OY2', 'P', 'R', 'S', 'SH', 'T', 'TH', 'UH', 'UH0', 'UH1', 'UH2', 'UW',
    'UW0', 'UW1', 'UW2', 'V', 'W', 'Y', 'Z', 'ZH'
]

Considering this script is taken from tacotron, it's safe to assume it would affect also other implementations.

If you are trying a different language you must change this list with the list / alphabet used by your phonetic dictionary. Now, in my specific case even that didn't work (being a weird sub-sample of the SAMPA alphabet), so I made a script that extracted all the phonetic characters composing the alphabet from my dictionary. This was the reason, in my case, of the misinterpretation of the durations leading to tensors of different sizes! I'd suggest to add in the README file the need to change also this bit if training in languages other than english (or using phonetic lexicons coming in different phonetic dictionaries).

Thanks again, @ming024, for you work!

chazo1994 commented 4 years ago

Pytorch vesion 1.6.0 stable not support autograd for 'bucketize'. So you should change to lastest version, even so I think the output tensor does not require gradients. https://github.com/pytorch/pytorch/issues/45119#issuecomment-696971735

csukuangfj commented 4 years ago

The solution is to change

https://github.com/ming024/FastSpeech2/blob/35efa49ecf79cfbcec058248a42a5eedcbf967d2/modules.py#L29-L30

to

        self.pitch_bins = nn.Parameter(torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1)), requires_grad=False)
        self.energy_bins = nn.Parameter(torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1), requires_grad=False)

That's, add , requires_grad=False to nn.Parameters' constructor.

leijue222 commented 4 years ago

@Mao-JianGuo Have you tried it on the Biaobei dataset? I used MFA to process Biaobei data. And I try to run train.py on it but failed... My environment is torch 1.6 stable version, cuda10.2 in ubuntu18.04.

I refer you to the changes here: https://github.com/ming024/FastSpeech2/issues/14#issuecomment-678245971

Then, if I refer @Mao-JianGuo change
https://github.com/ming024/FastSpeech2/blob/35efa49ecf79cfbcec058248a42a5eedcbf967d2/modules.py#L29-L30 to


 self.pitch_bins = torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins - 1)).cuda()
 self.energy_bins = torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins - 1).cuda()

Neither LJSpeech nor Biaobei datasets can be trained, and get the error of

RuntimeError: boundaries and input value tensors should have same device type, but we got boundaries tensor device type cuda:0 and input value tensor device type cuda:1

Or, if I refer @csukuangfj change https://github.com/ming024/FastSpeech2/blob/35efa49ecf79cfbcec058248a42a5eedcbf967d2/modules.py#L29-L30 to


 self.pitch_bins = nn.Parameter(torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1)), requires_grad=False)
 self.energy_bins = nn.Parameter(torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1), requires_grad=False)

LJSpeech can be trained but the Biaobei data set will have the following error:

RuntimeError: The size of tensor a (10) must match the size of tensor b (67) at non-singleton dimension 1

How can I solve this problem? @ming024

csukuangfj commented 4 years ago

RuntimeError: boundaries and input value tensors should have same device type, but we got boundaries tensor device type cuda:0 and input value tensor device type cuda:1

export CUDA_VISIBLE_DEVICES=0 might be helpful.

leijue222 commented 4 years ago

RuntimeError: boundaries and input value tensors should have same device type, but we got boundaries tensor device type cuda:0 and input value tensor device type cuda:1

export CUDA_VISIBLE_DEVICES=0 might be helpful.

Thanks, I have tried it and it didn't work. Use your method to change requires_grad=False is worked, but the error of The size of tensor a (10) must match the size of tensor b (67) at non-singleton dimension 1 occurs after changing to another dataset.

csukuangfj commented 4 years ago

Use your method to change requires_grad=False is worked

I have tried only the LJSpeech dataset. You can read the code and try to find the bug on your own. I don't think it is that hard.

leijue222 commented 4 years ago

Use your method to change requires_grad=False is worked

I have tried only the LJSpeech dataset. You can read the code and try to find the bug on your own. I don't think it is that hard.

Thanks for your help.

yBeOne commented 3 years ago

@leijue222,have you solved you problem:The size of tensor a (10) must match the size of tensor b (67) at non-singleton dimension 1

leijue222 commented 3 years ago

@leijue222,have you solved you problem:The size of tensor a (10) must match the size of tensor b (67) at non-singleton dimension 1

Nope, I changed to use tacotron2.

KAIMAN0 commented 3 years ago

If you train in new languages(other than English and Mandarin), look carefully the symbols.py text/symbols.py and make sure that all phones are included in symbols.

leijue222 commented 3 years ago

If you train in new languages(other than English and Mandarin), look carefully the symbols.py text/symbols.py and make sure that all phones are included in symbols.

You are right, whether taco2 or fastspeech2, each language's phones text/symbols.py show be guaranteed included in symbols and distinguishable.

yinchyu commented 3 years ago

why the problems will be occur?
The size of tensor a (10) must match the size of tensor b (67) at non-singleton x = x + energy_embedding x = x + pitch_embedding

azman-i commented 3 years ago

@yinchyu did you get the solution?

everschen commented 2 years ago

I meet the same issue, yes, I use my own lexicon and did MFA train my own dataset, my total sentence is 88770 (aishell3 + my own 735), however duration/energy/mel/pitch all is 88764, that seems some 6 sentence is missing, I guess this is because of symbols reason. I will try to figure out this.

I paste my error for your information: python3 train.py -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml Prepare training ... Number of FastSpeech2 Parameters: 35215425 Removing weight norm... Training: 0%| | 30/900000 [00:07<75:47:06, 3.30it/s]Traceback (most recent call last): | 7/1379 [00:07<24:51, 1.09s/it] File "train.py", line 198, in main(args, configs) File "train.py", line 82, in main output = model((batch[2:])) File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(*inputs[0], *kwargs[0]) File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/evers/FastSpeech2-test/model/fastspeech2.py", line 91, in forward d_control, File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/evers/FastSpeech2-test/model/modules.py", line 121, in forward x = x + pitch_embedding RuntimeError: The size of tensor a (25) must match the size of tensor b (26) at non-singleton dimension 1 Training: 0%| | 30/900000 [00:08<68:24:33, 3.65it/s] Exception ignored in: <function tqdm.del at 0x7f58f6509290> Traceback (most recent call last): File "/home/evers/.local/lib/python3.7/site-packages/tqdm/std.py", line 1086, in del File "/home/evers/.local/lib/python3.7/site-packages/tqdm/std.py", line 1270, in close File "/home/evers/.local/lib/python3.7/site-packages/tqdm/std.py", line 572, in _decr_instances File "/home/evers/.local/lib/python3.7/site-packages/tqdm/_monitor.py", line 51, in exit File "/usr/lib/python3.7/threading.py", line 522, in set File "/usr/lib/python3.7/threading.py", line 365, in notify_all File "/usr/lib/python3.7/threading.py", line 348, in notify TypeError: 'NoneType' object is not callable

everschen commented 2 years ago

https://github.com/ming024/FastSpeech2/pull/153 I have verified, my training is ongoing now...

lunar333 commented 1 year ago

I meet the same issue, yes, I use my own lexicon and did MFA train my own dataset, my total sentence is 88770 (aishell3 + my own 735), however duration/energy/mel/pitch all is 88764, that seems some 6 sentence is missing, I guess this is because of symbols reason. I will try to figure out this.

I paste my error for your information: python3 train.py -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml Prepare training ... Number of FastSpeech2 Parameters: 35215425 Removing weight norm... Training: 0%| | 30/900000 [00:07<75:47:06, 3.30it/s]Traceback (most recent call last): | 7/1379 [00:07<24:51, 1.09s/it] File "train.py", line 198, in main(args, configs) File "train.py", line 82, in main output = model((batch[2:])) File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(*inputs[0], *kwargs[0]) File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/evers/FastSpeech2-test/model/fastspeech2.py", line 91, in forward d_control, File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/evers/FastSpeech2-test/model/modules.py", line 121, in forward x = x + pitch_embedding RuntimeError: The size of tensor a (25) must match the size of tensor b (26) at non-singleton dimension 1 Training: 0%| | 30/900000 [00:08<68:24:33, 3.65it/s] Exception ignored in: <function tqdm.del at 0x7f58f6509290> Traceback (most recent call last): File "/home/evers/.local/lib/python3.7/site-packages/tqdm/std.py", line 1086, in del** File "/home/evers/.local/lib/python3.7/site-packages/tqdm/std.py", line 1270, in close File "/home/evers/.local/lib/python3.7/site-packages/tqdm/std.py", line 572, in _decr_instances File "/home/evers/.local/lib/python3.7/site-packages/tqdm/_monitor.py", line 51, in exit File "/usr/lib/python3.7/threading.py", line 522, in set File "/usr/lib/python3.7/threading.py", line 365, in notify_all File "/usr/lib/python3.7/threading.py", line 348, in notify TypeError: 'NoneType' object is not callable

how do you fix this error? I met the same error

azman-i commented 1 year ago

Hi @lunar333 can you please check if the lexicon(letters or phonemesin your language) of your dataset and the lexicon you mentioned in symbols.py same?Tensor size becomes different when the lexicon of dataset and mentioned in symbols.py becomes different.