RuntimeError: Input tensor at index 1 has invalid shape [4, 926, 126, 2001], but expected [4, 1668, 126, 2001]

dh7ahn commented 3 years ago

KsponSpeech 데이터로 RNN transducer 모델을 학습시켜보려고 하는데요, 첫번째 epoch 시작 직후 아래와 같은 에러가 발생합니다. 뭐가 잘못 되었을까요?

character vocab을 사용하였고, LAS나 transformer 모델 학습에서는 이러한 문제가 없습니다.

아래에 실행 로그 첨부합니다.

[2021-02-26 11:47:05,284][kospeech.utils][INFO] - audio: audio_extension: pcm sample_rate: 16000 frame_length: 20 frame_shift: 10 normalize: true del_silence: true feature_extract_by: kaldi time_mask_num: 4 freq_mask_num: 2 spec_augment: true input_reverse: false transform_method: fbank n_mels: 80 freq_mask_para: 18 audio_extension: pcm transform_method: fbank sample_rate: 16000 frame_length: 20 frame_shift: 10 n_mels: 80 normalize: true del_silence: true feature_extract_by: kaldi freq_mask_para: 18 time_mask_num: 4 freq_mask_num: 2 spec_augment: true input_reverse: false model: architecture: rnnt teacher_forcing_ratio: 1.0 teacher_forcing_step: 0.01 min_teacher_forcing_ratio: 0.9 dropout: 0.3 bidirectional: true joint_ctc_attention: false max_len: 400 num_encoder_layers: 4 num_decoder_layers: 1 encoder_hidden_state_dim: 320 decoder_hidden_state_dim: 512 output_dim: 512 rnn_type: lstm encoder_dropout_p: 0.2 decoder_dropout_p: 0.2 architecture: rnnt num_encoder_layers: 4 num_decoder_layers: 1 encoder_hidden_state_dim: 320 decoder_hidden_state_dim: 512 output_dim: 512 rnn_type: lstm bidirectional: true encoder_dropout_p: 0.2 decoder_dropout_p: 0.2 train: dataset: kspon dataset_path: /data/KsponSpeech transcripts_path: ../../../data/transcripts.txt output_unit: character batch_size: 32 save_result_every: 1000 checkpoint_every: 5000 print_every: 10 mode: train num_workers: 4 use_cuda: true init_lr_scale: 0.01 final_lr_scale: 0.05 max_grad_norm: 400 weight_decay: 1.0e-05 seed: 777 resume: false optimizer: adam init_lr: 1.0e-06 final_lr: 1.0e-06 peak_lr: 0.0001 warmup_steps: 400 num_epochs: 20 reduction: mean label_smoothing: 0.1 lr_scheduler: tri_stage_lr_scheduler

[2021-02-26 11:47:05,419][kospeech.utils][INFO] - Operating System : Linux 3.10.0-957.10.1.el7.x86_64 [2021-02-26 11:47:05,420][kospeech.utils][INFO] - Processor : x86_64 [2021-02-26 11:47:05,449][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,450][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,450][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,450][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,450][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,451][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,451][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,451][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,452][kospeech.utils][INFO] - CUDA is available : True [2021-02-26 11:47:05,452][kospeech.utils][INFO] - CUDA version : 10.1 [2021-02-26 11:47:05,452][kospeech.utils][INFO] - PyTorch version : 1.7.1+cu101 [2021-02-26 11:47:05,459][kospeech.utils][INFO] - split dataset start !! [2021-02-26 11:47:11,514][kospeech.utils][INFO] - Applying Spec Augmentation... [2021-02-26 11:47:13,673][kospeech.utils][INFO] - Applying Spec Augmentation... [2021-02-26 11:47:15,848][kospeech.utils][INFO] - Applying Spec Augmentation... [2021-02-26 11:47:17,873][kospeech.utils][INFO] - Applying Spec Augmentation... [2021-02-26 11:47:19,916][kospeech.utils][INFO] - split dataset complete !! DataParallel( (module): RNNTransducer( (encoder): EncoderRNNT( (rnn): LSTM(80, 320, num_layers=4, batch_first=True, dropout=0.2, bidirectional=True) (out_proj): Linear( (linear): Linear(in_features=640, out_features=512, bias=True) ) ) (decoder): DecoderRNNT( (embedding): Embedding(2001, 512) (rnn): LSTM(512, 512, batch_first=True, dropout=0.2) (out_proj): Linear( (linear): Linear(in_features=512, out_features=512, bias=True) ) ) (fc): Linear( (linear): Linear(in_features=1024, out_features=2001, bias=False) ) ) ) [2021-02-26 11:47:20,263][kospeech.utils][INFO] - start [2021-02-26 11:47:20,263][kospeech.utils][INFO] - Epoch 0 start Traceback (most recent call last): File "/home/a77/workspace/ext/KoSpeech/bin/main.py", line 174, in main last_model_checkpoint = train(config) File "/home/a77/workspace/ext/KoSpeech/bin/main.py", line 134, in train model = trainer.train( File "/data/KoSpeech/kospeech/trainer/supervised_trainer.py", line 160, in train model, train_loss, train_cer = self._train_epoches( File "/data/KoSpeech/kospeech/trainer/supervised_trainer.py", line 255, in _train_epoches output, loss, ctc_loss, cross_entropy_loss = self._model_forward( File "/data/KoSpeech/kospeech/trainer/supervised_trainer.py", line 431, in _model_forward outputs = model(inputs, input_lengths, targets, target_lengths) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward return self.gather(outputs, self.output_device) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather return gather(outputs, output_device, dim=self.dim) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather res = gather_map(outputs) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, outputs) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 71, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/comm.py", line 230, in gather return torch._C._gather(tensors, dim, destination) RuntimeError: Input tensor at index 1 has invalid shape [4, 926, 126, 2001], but expected [4, 1668, 126, 2001]

sooftware commented 3 years ago

Multi-GPU 트레이닝을 하신 것 같네요. 제가 일이 많아서 로그를 자세히 보지는 못하는 점 양해 부탁드립니다.
KoSpeech 몇몇 모델의 경우 멀티지피유 트레이닝시에 에러가 나는 경우가 있습니다. 어떤 모델인지 코멘트 달아주시면 여유가 생기면 수정해보겠습니다.

dh7ahn commented 3 years ago

P40x8, CUDA 10.1, Pytorch 1.7.1, RNN transducer입니다 (model=rnn train=rnnt_train)

sooftware commented 3 years ago

너무 늦은 답변 죄송합니다. RNN Transducer의 경우 멀티 GPU가 현재 지원이 되지 않는 것 같네요. 간단한 에러겠지만 제가 최근에 개인적으로 바빠서 수정을 못하고 있습니다. 다른 모델을 사용해보시는게 어떠실까요?

sooftware / kospeech

RuntimeError: Input tensor at index 1 has invalid shape [4, 926, 126, 2001], but expected [4, 1668, 126, 2001] #117