Open mataym opened 4 years ago
to above error, i change the pytorch version to solve it , but now training model occurred another error:
3 5 5 3 6 3 6 3 10 4 8 3 6 4 4 5 4 15] (66,) 00000658
[] (0,) [16 19 9 5 5 11 5 4 16 10 8 2 6 5 7 7 7 4 4 6 9 5 9 5
6 4 6 4 5 5 5 4 6 16 9 5 5 6 7 10 4 4 3 5 6 3 3 4
4 14 8 7 5 3 5 11 5 7 2 10 11 8 8 13 9] (65,) 00000749
Traceback (most recent call last):
File "train.py", line 238, in
i have the same problem
@mataym @Mao-JianGuo I am sorry that I am now busy with another project. I will check this problem several days later.
i found a solution
In modules.py change
self.pitch_bins = nn.Parameter(torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1))) self.energy_bins = nn.Parameter(torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1))
to
self.pitch_bins = torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1)).cuda() self.energy_bins = torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1).cuda()
i found a solution In modules.py change
self.pitch_bins = nn.Parameter(torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1))) self.energy_bins = nn.Parameter(torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1))
to
self.pitch_bins = torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1)).cuda() self.energy_bins = torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1).cuda()
after i changed the code as ur said, the error is still occurred!
i found a solution In modules.py change
self.pitch_bins = nn.Parameter(torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1))) self.energy_bins = nn.Parameter(torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1))
toself.pitch_bins = torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1)).cuda() self.energy_bins = torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1).cuda()
after i changed the code as ur said, the error is still occurred!
you need to reinstall the lastest pytroch
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch or pip install torch torchvision
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
my cuda version is 10.1,pytorch==1.6.0.dev20200428+cu101 , is it necessary to install cuda 10.2?
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
my cuda version is 10.1,pytorch==1.6.0.dev20200428+cu101 , is it necessary to install cuda 10.2? I sucessfully run the project in cuda version10.2
I'm not sure, but you can try installing conda install Python torch vision cudatatoolkit = 10.1 - C python
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
my cuda version is 10.1,pytorch==1.6.0.dev20200428+cu101 , is it necessary to install cuda 10.2? I sucessfully run the project in cuda version10.2
I'm not sure, but you can try installing conda install Python torch vision cudatatoolkit = 10.1 - C python
after i upgraded latest pytorch, the same error is still occurring.
Traceback (most recent call last):
File "train.py", line 238, in
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
my cuda version is 10.1,pytorch==1.6.0.dev20200428+cu101 , is it necessary to install cuda 10.2? I sucessfully run the project in cuda version10.2
I'm not sure, but you can try installing conda install Python torch vision cudatatoolkit = 10.1 - C python
after i upgraded latest pytorch, the same error is still occurring. Traceback (most recent call last): File "train.py", line 238, in main(args) File "train.py", line 112, in main log_duration_output, log_D, f0_output, f0, energy_output, energy, mel_output, mel_postnet_output, mel_target, ~src_mask, ~mel_mask) File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/speechlab/temp/fs2p/loss.py", line 20, in forward log_d_target = log_d_target.masked_select(src_mask) RuntimeError: The size of tensor a (7) must match the size of tensor b (112) at non-singleton dimension 1 any idea for this problem?
Obviously, it's not the same error, I haven't encountered it, I can't help you
to above error, i change the pytorch version to solve it , but now training model occurred another error:
3 5 5 3 6 3 6 3 10 4 8 3 6 4 4 5 4 15] (66,) 00000658 [] (0,) [16 19 9 5 5 11 5 4 16 10 8 2 6 5 7 7 7 4 4 6 9 5 9 5 6 4 6 4 5 5 5 4 6 16 9 5 5 6 7 10 4 4 3 5 6 3 3 4 4 14 8 7 5 3 5 11 5 7 2 10 11 8 8 13 9] (65,) 00000749 Traceback (most recent call last): File "train.py", line 238, in main(args) File "train.py", line 112, in main log_duration_output, log_D, f0_output, f0, energy_output, energy, mel_output, mel_postnet_output, mel_target, ~src_mask, ~mel_mask) File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 562, in call result = self.forward(*input, **kwargs) File "/home/speechlab/temp/fs2p/loss.py", line 20, in forward log_d_target = log_d_target.masked_select(src_mask) RuntimeError: The size of tensor a (8) must match the size of tensor b (131) at non-singleton dimension 1 can anyone give me a help how to solve this problem?
Hi!
I'm not 100% sure that @Mao-JianGuo suggestions fix your original problem (as they didn't fix it for me)... because I suspect it might have to do with our datasets alignment using MFA.
Did you use a custom dataset? Did you use MFA's train and align utility with a custom lexicon corpus / acoustic model?
You see, when I train and align my own custom dataset (with my own g2p trained dictionary as there isn't around a pretrained one I can use for my language) I get this warning (MFA version 1.0.1 compiled along with Kaldi from scratch):
FastSpeech2 git:(master) ✗ mfa_train_and_align myCorpus.lab preprocessed/myDataset/TextGrid/ --output_model_path myAcustModel -c -j8
aligner/command_line/train_and_align.py:30: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information...
Creating dictionary information...
Setting up training data...
**There were words not found in the dictionary. Would you like to abort to fix them? (Y/N)**
I imagine it has to do with this because those arrays / tuples you see to start with (before the error traceback) are printed during the data loading phase, in https://github.com/ming024/FastSpeech2/blob/master/dataset.py#L56:
for text, D, id_ in zip(texts, Ds, ids):
if len(text) != len(D):
print(text, text.shape, D, D.shape, id_)
I'm getting the same error and the fact sometimes is different (and looks like a different error) might have to do with batch shuffling.
I think this problem arises from false alignment with MFA. As @kairosdojo said, the duration arrays of some sentences are empty.
Hi, everyone! I was able to solve my case so I'd like to share what I've found with you, as someone else might find this useful (notice well: this isn't valid just for this FastSpeech2 implementation but most likely for other ones too).
There were errors in my MFA alignment due mostly to my own dataset containing writing errors and words not found in the dictionary. Fixing that is a pain but with patience, time and combining quick check scripts and manual edits I was able to do that.
I recommend to apply both patches by @Mao-JianGuo described here as I've encountered the errors they fix later.
This one was weirdly the most difficult (yet so obvious!) to discover: the phonetic alphabet. You see, in https://github.com/ming024/FastSpeech2/blob/master/text/cmudict.py#L6 it is specified the ARPA phonetic alphabet used in cmudict for the english language:
valid_symbols = [
'AA', 'AA0', 'AA1', 'AA2', 'AE', 'AE0', 'AE1', 'AE2', 'AH', 'AH0', 'AH1', 'AH2',
'AO', 'AO0', 'AO1', 'AO2', 'AW', 'AW0', 'AW1', 'AW2', 'AY', 'AY0', 'AY1', 'AY2',
'B', 'CH', 'D', 'DH', 'EH', 'EH0', 'EH1', 'EH2', 'ER', 'ER0', 'ER1', 'ER2', 'EY',
'EY0', 'EY1', 'EY2', 'F', 'G', 'HH', 'IH', 'IH0', 'IH1', 'IH2', 'IY', 'IY0', 'IY1',
'IY2', 'JH', 'K', 'L', 'M', 'N', 'NG', 'OW', 'OW0', 'OW1', 'OW2', 'OY', 'OY0',
'OY1', 'OY2', 'P', 'R', 'S', 'SH', 'T', 'TH', 'UH', 'UH0', 'UH1', 'UH2', 'UW',
'UW0', 'UW1', 'UW2', 'V', 'W', 'Y', 'Z', 'ZH'
]
Considering this script is taken from tacotron, it's safe to assume it would affect also other implementations.
If you are trying a different language you must change this list with the list / alphabet used by your phonetic dictionary. Now, in my specific case even that didn't work (being a weird sub-sample of the SAMPA alphabet), so I made a script that extracted all the phonetic characters composing the alphabet from my dictionary. This was the reason, in my case, of the misinterpretation of the durations leading to tensors of different sizes! I'd suggest to add in the README file the need to change also this bit if training in languages other than english (or using phonetic lexicons coming in different phonetic dictionaries).
Thanks again, @ming024, for you work!
Pytorch vesion 1.6.0 stable not support autograd for 'bucketize'. So you should change to lastest version, even so I think the output tensor does not require gradients. https://github.com/pytorch/pytorch/issues/45119#issuecomment-696971735
The solution is to change
to
self.pitch_bins = nn.Parameter(torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1)), requires_grad=False)
self.energy_bins = nn.Parameter(torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1), requires_grad=False)
That's, add , requires_grad=False
to nn.Parameters
' constructor.
@Mao-JianGuo Have you tried it on the Biaobei dataset? I used MFA to process Biaobei data. And I try to run train.py on it but failed... My environment is torch 1.6 stable version, cuda10.2 in ubuntu18.04.
I refer you to the changes here: https://github.com/ming024/FastSpeech2/issues/14#issuecomment-678245971
Then, if I refer @Mao-JianGuo change
https://github.com/ming024/FastSpeech2/blob/35efa49ecf79cfbcec058248a42a5eedcbf967d2/modules.py#L29-L30
to
self.pitch_bins = torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins - 1)).cuda()
self.energy_bins = torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins - 1).cuda()
Neither LJSpeech nor Biaobei datasets can be trained, and get the error of
RuntimeError: boundaries and input value tensors should have same device type, but we got boundaries tensor device type cuda:0 and input value tensor device type cuda:1
Or, if I refer @csukuangfj change https://github.com/ming024/FastSpeech2/blob/35efa49ecf79cfbcec058248a42a5eedcbf967d2/modules.py#L29-L30 to
self.pitch_bins = nn.Parameter(torch.exp(torch.linspace(np.log(hp.f0_min), np.log(hp.f0_max), hp.n_bins-1)), requires_grad=False)
self.energy_bins = nn.Parameter(torch.linspace(hp.energy_min, hp.energy_max, hp.n_bins-1), requires_grad=False)
LJSpeech can be trained but the Biaobei data set will have the following error:
RuntimeError: The size of tensor a (10) must match the size of tensor b (67) at non-singleton dimension 1
How can I solve this problem? @ming024
RuntimeError: boundaries and input value tensors should have same device type, but we got boundaries tensor device type cuda:0 and input value tensor device type cuda:1
export CUDA_VISIBLE_DEVICES=0
might be helpful.
RuntimeError: boundaries and input value tensors should have same device type, but we got boundaries tensor device type cuda:0 and input value tensor device type cuda:1
export CUDA_VISIBLE_DEVICES=0
might be helpful.
Thanks, I have tried it and it didn't work.
Use your method to change requires_grad=False
is worked, but the error of The size of tensor a (10) must match the size of tensor b (67) at non-singleton dimension 1
occurs after changing to another dataset.
Use your method to change requires_grad=False is worked
I have tried only the LJSpeech dataset. You can read the code and try to find the bug on your own. I don't think it is that hard.
Use your method to change requires_grad=False is worked
I have tried only the LJSpeech dataset. You can read the code and try to find the bug on your own. I don't think it is that hard.
Thanks for your help.
@leijue222,have you solved you problem:The size of tensor a (10) must match the size of tensor b (67) at non-singleton dimension 1
@leijue222,have you solved you problem:The size of tensor a (10) must match the size of tensor b (67) at non-singleton dimension 1
Nope, I changed to use tacotron2.
If you train in new languages(other than English and Mandarin), look carefully the symbols.py text/symbols.py and make sure that all phones are included in symbols.
If you train in new languages(other than English and Mandarin), look carefully the symbols.py text/symbols.py and make sure that all phones are included in symbols.
You are right, whether taco2 or fastspeech2, each language's phones text/symbols.py show be guaranteed included in symbols and distinguishable.
why the problems will be occur?
The size of tensor a (10) must match the size of tensor b (67) at non-singleton
x = x + energy_embedding
x = x + pitch_embedding
@yinchyu did you get the solution?
I meet the same issue, yes, I use my own lexicon and did MFA train my own dataset, my total sentence is 88770 (aishell3 + my own 735), however duration/energy/mel/pitch all is 88764, that seems some 6 sentence is missing, I guess this is because of symbols reason. I will try to figure out this.
I paste my error for your information:
python3 train.py -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml
Prepare training ...
Number of FastSpeech2 Parameters: 35215425
Removing weight norm...
Training: 0%| | 30/900000 [00:07<75:47:06, 3.30it/s]Traceback (most recent call last): | 7/1379 [00:07<24:51, 1.09s/it]
File "train.py", line 198, in
https://github.com/ming024/FastSpeech2/pull/153 I have verified, my training is ongoing now...
I meet the same issue, yes, I use my own lexicon and did MFA train my own dataset, my total sentence is 88770 (aishell3 + my own 735), however duration/energy/mel/pitch all is 88764, that seems some 6 sentence is missing, I guess this is because of symbols reason. I will try to figure out this.
I paste my error for your information: python3 train.py -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml Prepare training ... Number of FastSpeech2 Parameters: 35215425 Removing weight norm... Training: 0%| | 30/900000 [00:07<75:47:06, 3.30it/s]Traceback (most recent call last): | 7/1379 [00:07<24:51, 1.09s/it] File "train.py", line 198, in main(args, configs) File "train.py", line 82, in main output = model((batch[2:])) File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(*inputs[0], *kwargs[0]) File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/evers/FastSpeech2-test/model/fastspeech2.py", line 91, in forward d_control, File "/home/evers/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/evers/FastSpeech2-test/model/modules.py", line 121, in forward x = x + pitch_embedding RuntimeError: The size of tensor a (25) must match the size of tensor b (26) at non-singleton dimension 1 Training: 0%| | 30/900000 [00:08<68:24:33, 3.65it/s] Exception ignored in: <function tqdm.del at 0x7f58f6509290> Traceback (most recent call last): File "/home/evers/.local/lib/python3.7/site-packages/tqdm/std.py", line 1086, in del** File "/home/evers/.local/lib/python3.7/site-packages/tqdm/std.py", line 1270, in close File "/home/evers/.local/lib/python3.7/site-packages/tqdm/std.py", line 572, in _decr_instances File "/home/evers/.local/lib/python3.7/site-packages/tqdm/_monitor.py", line 51, in exit File "/usr/lib/python3.7/threading.py", line 522, in set File "/usr/lib/python3.7/threading.py", line 365, in notify_all File "/usr/lib/python3.7/threading.py", line 348, in notify TypeError: 'NoneType' object is not callable
how do you fix this error? I met the same error
Hi @lunar333 can you please check if the lexicon(letters or phonemesin your language) of your dataset and the lexicon you mentioned in symbols.py same?Tensor size becomes different when the lexicon of dataset and mentioned in symbols.py becomes different.
In this case, after generated textgrid files by MFA and placed in the preprocessed folder, i ran the scripts preprocess.py, prepare_align.py and preprocess.py sperataly and no error occured, and created these file: alignment energy f0 mel stat.txt train.txt val.txt; then i ran the python train.py script to train the model, but get the error as follows:
5 15] (50,) 00001943 [] (0,) [26 15 12 6 4 4 10 11 5 3 2 3 2 5 7 7 8 8 3 3 7 9 5 4 5 8 15 6 7 5 8 9 3 4 5 10 12 4 6 4 6 9 4 9 5 12] (46,) 00000539 Traceback (most recent call last): File "train.py", line 238, in
main(args)
File "train.py", line 108, in main
text, src_len, mel_len, D, f0, energy, max_src_len, max_mel_len)
File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, kwargs)
File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], *kwargs[0])
File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/home/speechlab/temp/fs2p/fastspeech2.py", line 36, in forward
encoder_output, src_mask, mel_mask, d_target, p_target, e_target, max_mel_len)
File "/home/speechlab/anaconda3/envs/fs2p/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/speechlab/temp/fs2p/modules.py", line 46, in forward
pitch_embedding = self.pitch_embedding(torch.bucketize(pitch_target, self.pitch_bins))
RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/autograd/functions/utils.h":59, please report a bug to PyTorch.
(fs2p) [speechlab@localhost fs2p]$ RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/autograd/functions/utils.h":59, please report a bug to PyTorch.
in this case, i install pytorch 1.6 stable edition as offical pytorch site. but not torch_nightly pip3 install --pre torch==1.6.0.dev20200428 -f https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html
how do i fix this problem? thanks advanced!