Open RaghothamRao opened 4 years ago
An update: Tried on Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-1095-aws x86_64v) as well with pytorch 1.1.0 & torchvision 0.3.0 and cudatoolkit 10. No Luck yet on training LJSpeech pretrained model for speaker adaptation.
Traceback (most recent call last):
File "train.py", line 984, in
Hi, Just wanted to give some background before i raised this issue. Background:
Few pytorch version-cudatoolkit combinations and errors:
pytorch 1.4 & cuda 9.2 (using code on git commit) File "train.py", line 983, in
train_seq2seq=train_seq2seq, train_postnet=train_postnet)
File "train.py", line 589, in train
in tqdm(enumerate(data_loader)):
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\tqdm\std.py", line 1107, in iter
for obj in iterable:
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\torch\utils\data\dataloader.py", line 345, in next
data = self._next_data()
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\torch\utils\data\dataloader.py", line 856, in _next_data
return self._process_data(data)
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\torch\utils\data\dataloader.py", line 881, in _process_data
data.reraise()
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\torch_utils.py", line 394, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\torch\utils\data_utils\pin_memory.py", line 31, in _pin_memory_loop
data = pin_memory(data)
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\torch\utils\data_utils\pin_memory.py", line 55, in pin_memory
return [pin_memory(sample) for sample in data]
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\torch\utils\data_utils\pin_memory.py", line 55, in
return [pin_memory(sample) for sample in data]
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\torch\utils\data_utils\pin_memory.py", line 55, in pin_memory
return [pin_memory(sample) for sample in data]
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\torch\utils\data_utils\pin_memory.py", line 55, in
return [pin_memory(sample) for sample in data]
File "C:\ProgramData\Anaconda3\envs\DeepVoice3\lib\site-packages\torch\utils\data_utils\pin_memory.py", line 47, in pin_memory
return data.pin_memory()
RuntimeError: unsupported operation: more than one element of the written-to tensor refers to a single memory location. Please clone() the tensor before performing the operation.
pytorch 1.4 & cuda 9.2 (using code on master branch) File "train.py", line 1017, in
train_seq2seq=train_seq2seq, train_postnet=train_postnet)
File "train.py", line 723, in train
priority_w=hparams.priority_freq_weight)
File "train.py", line 557, in spec_loss
l1_loss = w masked_l1(y_hat, y, mask=mask) + (1 - w) l1(y_hat, y)
File "C:\ProgramData\Anaconda3\envs\DV3pip\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "train.py", line 290, in forward
loss = self.criterion(input mask, target * mask_)
RuntimeError: The size of tensor a (513) must match the size of tensor b (1025) at non-singleton dimension 2
With pytorch 1.1.0 & torchvision 0.3.0 and cudatoolkit 9 (with master as well as particular git commit) RuntimeError: The size of tensor a (513) must match the size of tensor b (1025) at non-singleton dimension 2
With pytorch==1.2.0, torchvision==0.4.0 cudatoolkit=10.0 (with code on git commit) RuntimeError: reduce failed to synchronize: device-side assert triggered
With pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch (with master) RuntimeError: Expected object of scalar type Float but got scalar type Long for argument #2 'other'
conda install pytorch==1.0.0 torchvision==0.2.1 cuda80 -c pytorch (on git commit) RuntimeError: The size of tensor a (513) must match the size of tensor b (1025) at non-singleton dimension 2
Could someone kindly advise on the pytorch, cudatoolkit combination that this code with LJspeech pre-trained model works with?