Error Message during training

prows12 commented 4 years ago

[2020-08-27 20:20:14,206 utils.py:21 - info()] timestep: 10/70530, loss: nan, cer: 3.32, elapsed: 41.31s 0.69m 0.01h, lr: 0.00030 [2020-08-27 20:20:49,702 utils.py:21 - info()] timestep: 20/70530, loss: nan, cer: 2.88, elapsed: 35.50s 1.28m 0.02h, lr: 0.00030 [2020-08-27 20:21:18,650 utils.py:21 - info()] timestep: 30/70530, loss: nan, cer: 2.81, elapsed: 28.95s 1.76m 0.03h, lr: 0.00030 [2020-08-27 20:22:01,191 utils.py:21 - info()] timestep: 40/70530, loss: nan, cer: 2.96, elapsed: 42.54s 2.47m 0.04h, lr: 0.00030 [2020-08-27 20:22:39,461 utils.py:21 - info()] timestep: 50/70530, loss: nan, cer: 2.98, elapsed: 38.27s 3.11m 0.05h, lr: 0.00030 [2020-08-27 20:23:21,102 utils.py:21 - info()] timestep: 60/70530, loss: nan, cer: 3.09, elapsed: 41.64s 3.80m 0.06h, lr: 0.00030 [2020-08-27 20:23:53,312 utils.py:21 - info()] timestep: 70/70530, loss: nan, cer: 3.10, elapsed: 32.21s 4.34m 0.07h, lr: 0.00030 [2020-08-27 20:24:25,110 utils.py:21 - info()] timestep: 80/70530, loss: nan, cer: 3.08, elapsed: 31.80s 4.87m 0.08h, lr: 0.00030 [2020-08-27 20:25:10,588 utils.py:21 - info()] timestep: 90/70530, loss: nan, cer: 3.16, elapsed: 45.48s 5.63m 0.09h, lr: 0.00030 [2020-08-27 20:25:44,441 utils.py:21 - info()] timestep: 100/70530, loss: nan, cer: 3.13, elapsed: 33.85s 6.19m 0.10h, lr: 0.00030 Traceback (most recent call last): File "./main.py", line 111, in main() File "./main.py", line 107, in main train(opt) File "./main.py", line 86, in train num_epochs=opt.num_epochs, teacher_forcing_ratio=opt.teacher_forcing_ratio, resume=opt.resume) File "../kospeech/trainer/supervised_trainer.py", line 146, in train train_queue, teacher_forcing_ratio) File "../kospeech/trainer/supervised_trainer.py", line 231, in train_epoches logit = model(inputs, input_lengths, targets, return_attns=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, *kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 143, in forward return self.module(inputs, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "../kospeech/models/acoustic/transformer/transformer.py", line 160, in forward output, decoder_self_attns, memory_attns = self.decoder(targets, input_lengths, memory) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "../kospeech/models/acoustic/transformer/transformer.py", line 283, in forward output = self.input_dropout(self.embedding(inputs) + self.positional_encoding(inputs.size(1))) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, kwargs) File "../kospeech/models/acoustic/transformer/embeddings.py", line 43, in forward return self.embedding(inputs) self.sqrt_dim File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 114, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1724, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) train_queue, teacher_forcing_ratio) File "../kospeech/trainer/supervised_trainer.py", line 231, in __train_epoches logit = model(inputs, input_lengths, targets, return_attns=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 143, in forward return self.module(*inputs, *kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, kwargs) File "../kospeech/models/acoustic/transformer/transformer.py", line 160, in forward output, decoder_self_attns, memory_attns = self.decoder(targets, input_lengths, memory) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, *kwargs) File "../kospeech/models/acoustic/transformer/transformer.py", line 283, in forward output = self.input_dropout(self.embedding(inputs) + self.positional_encoding(inputs.size(1))) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, *kwargs) File "../kospeech/models/acoustic/transformer/embeddings.py", line 43, in forward return self.embedding(inputs) self.sqrt_dim File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call__ result = self.forward(*input, **kwa aihub_labels.zip

rgs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 114, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1724, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self

sooftware commented 4 years ago

음.. loss nan 되는거 보니 run_transformer.sh 로 돌리셨나요?
현재 트랜스포머는 학습이 되지 않고 있습니다.

prows12 commented 4 years ago

seq2seq도 같은 에러가 나요. 혹시 pytorch 버젼 몇 쓰셨어요?

sooftware commented 4 years ago

저는 1.4.0, 1.5.0, 1.6.0을 썼는데, 에러는 없었습니다. 아마 labels 경로 설정이 잘못되었거나 할것 같네요.

prows12 commented 4 years ago

답변 늦어서 미안합니다. 드디어 해결했습니다. 원인은 preprocessing을 하면 새롭게 aihub_label.csv 파일이 만들어 줍니다. 그러면 거기서 num_classes를 계산하고 임베딩을 만드는데 말씀하신대로 그 부분 링크가 제대로 되지 않았습니다. data쪽인가 model_builder인가 링크 하는 곳이 있었습니다

sooftware / kospeech

Error Message during training #44