sooftware / kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
https://sooftware.github.io/kospeech/
Apache License 2.0
605 stars 192 forks source link

train 학습진행이 안돼요 #58

Closed Ahnsh95 closed 3 years ago

Ahnsh95 commented 4 years ago

안녕하세요. 올려주신 코드를 돌려서 transcripts.txt를 만든다음, 학습을 진행하려고 하는데 epoch 0 start 이후로 진행이 안되네요.

KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition == [2020-11-12 15:26:26,417 utils.py:26 - info()] --mode: train [2020-11-12 15:26:26,417 utils.py:26 - info()] --transform_method: fbank [2020-11-12 15:26:26,417 utils.py:26 - info()] --sample_rate: 16000 [2020-11-12 15:26:26,417 utils.py:26 - info()] --frame_length: 20 [2020-11-12 15:26:26,417 utils.py:26 - info()] --frame_shift: 10 [2020-11-12 15:26:26,417 utils.py:26 - info()] --n_mels: 80 [2020-11-12 15:26:26,417 utils.py:26 - info()] --normalize: True [2020-11-12 15:26:26,417 utils.py:26 - info()] --del_silence: True [2020-11-12 15:26:26,417 utils.py:26 - info()] --input_reverse: True [2020-11-12 15:26:26,417 utils.py:26 - info()] --feature_extract_by: kaldi [2020-11-12 15:26:26,417 utils.py:26 - info()] --freq_mask_para: 18 [2020-11-12 15:26:26,417 utils.py:26 - info()] --time_mask_num: 4 [2020-11-12 15:26:26,417 utils.py:26 - info()] --freq_mask_num: 2 [2020-11-12 15:26:26,417 utils.py:26 - info()] --architecture: las [2020-11-12 15:26:26,417 utils.py:26 - info()] --use_bidirectional: True [2020-11-12 15:26:26,417 utils.py:26 - info()] --mask_conv: False [2020-11-12 15:26:26,417 utils.py:26 - info()] --hidden_dim: 512 [2020-11-12 15:26:26,417 utils.py:26 - info()] --dropout: 0.15 [2020-11-12 15:26:26,418 utils.py:26 - info()] --attn_mechanism: multi-head [2020-11-12 15:26:26,418 utils.py:26 - info()] --num_heads: 4 [2020-11-12 15:26:26,418 utils.py:26 - info()] --label_smoothing: 0.1 [2020-11-12 15:26:26,418 utils.py:26 - info()] --num_encoder_layers: 3 [2020-11-12 15:26:26,418 utils.py:26 - info()] --num_decoder_layers: 2 [2020-11-12 15:26:26,418 utils.py:26 - info()] --extractor: vgg [2020-11-12 15:26:26,418 utils.py:26 - info()] --activation: hardtanh [2020-11-12 15:26:26,418 utils.py:26 - info()] --rnn_type: lstm [2020-11-12 15:26:26,418 utils.py:26 - info()] --teacher_forcing_ratio: 1.0 [2020-11-12 15:26:26,418 utils.py:26 - info()] --spec_augment: True [2020-11-12 15:26:26,418 utils.py:26 - info()] --use_cuda: True [2020-11-12 15:26:26,418 utils.py:26 - info()] --batch_size: 32 [2020-11-12 15:26:26,418 utils.py:26 - info()] --num_workers: 4 [2020-11-12 15:26:26,418 utils.py:26 - info()] --num_epochs: 20 [2020-11-12 15:26:26,418 utils.py:26 - info()] --init_lr: 1e-06 [2020-11-12 15:26:26,418 utils.py:26 - info()] --warmup_steps: 2000 [2020-11-12 15:26:26,418 utils.py:26 - info()] --max_len: 250 [2020-11-12 15:26:26,418 utils.py:26 - info()] --max_grad_norm: 400 [2020-11-12 15:26:26,418 utils.py:26 - info()] --teacher_forcing_step: 0.01 [2020-11-12 15:26:26,418 utils.py:26 - info()] --min_teacher_forcing_ratio: 0.9 [2020-11-12 15:26:26,418 utils.py:26 - info()] --seed: 7 [2020-11-12 15:26:26,418 utils.py:26 - info()] --save_result_every: 1000 [2020-11-12 15:26:26,418 utils.py:26 - info()] --checkpoint_every: 5000 [2020-11-12 15:26:26,418 utils.py:26 - info()] --print_every: 10 [2020-11-12 15:26:26,418 utils.py:26 - info()] --resume: False [2020-11-12 15:26:26,563 utils.py:26 - info()] Operating System : Windows 10 [2020-11-12 15:26:26,563 utils.py:26 - info()] Processor : AMD64 Family 23 Model 8 Stepping 2, AuthenticAMD [2020-11-12 15:26:26,653 utils.py:26 - info()] device : GeForce GTX 1060 3GB [2020-11-12 15:26:26,653 utils.py:26 - info()] CUDA is available : True [2020-11-12 15:26:26,653 utils.py:26 - info()] CUDA version : 11.0 [2020-11-12 15:26:26,653 utils.py:26 - info()] PyTorch version : 1.7.0+cu110 [2020-11-12 15:26:26,656 utils.py:26 - info()] split dataset start !! [2020-11-12 15:26:30,667 utils.py:26 - info()] Applying Spec Augmentation... [2020-11-12 15:26:31,598 utils.py:26 - info()] Applying Spec Augmentation... [2020-11-12 15:26:32,681 utils.py:26 - info()] Applying Spec Augmentation... [2020-11-12 15:26:33,596 utils.py:26 - info()] Applying Spec Augmentation... [2020-11-12 15:26:34,589 utils.py:26 - info()] split dataset complete !! [2020-11-12 15:26:47,193 utils.py:26 - info()] start [2020-11-12 15:26:47,193 utils.py:26 - info()] Epoch 0 start

아무리 기다려도 더이상 되질 않는데 무엇이 문제일까요?

sooftware commented 4 years ago

쿠다 설치 잘 되셨는지랑 gpu 사용량이 증가하는지 체크해주세요

Ahnsh95 commented 4 years ago

쿠다는 잘 설치 되있고, gpu 사용량도 잘 증가합니다

sooftware commented 4 years ago

배치를 작게해서 진행해보실래요?

Ahnsh95 commented 4 years ago

배치를 작게해보니 진행이 됐습니다. 감사합니다. 그런데 배치사이즈를 8이나 4로 설정하고 돌리면 RuntimeError: CUDA out of memory가 뜨고, 배치사이즈를 1로 해야만 돌아가는데 배치사이즈를 올려도 잘 돌아갈 수 있게 하는 방법이 있나요?

sooftware commented 4 years ago

gpu 메모리가 부족한거라 gpu 교체말고는 하이퍼파라미터들을 줄이는 수밖에 없을 것 같네요

Ahnsh95 commented 4 years ago

아 그렇군요... 질문 하나만 더 드릴게요. run_pretrain을 실행시키면 ValueError: step must be greater than zero 에러가 뜹니다. 학습한 모델이나, 올려주신 모델을 사용해도 똑같이 뜨는데 어떻게 해야될까요?

sooftware commented 4 years ago

전체 에러 코드를 첨부해주실래요?

Ahnsh95 commented 4 years ago

오류 코드입니다.

D:\anaconda\lib\site-packages\librosa\filters.py:235: UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels. warnings.warn('Empty filters detected in mel frequency basis. ' Traceback (most recent call last): File "C:/Users/ash/Desktop/KoSpeech-master/bin/run_pretrain.py", line 32, in feature_vector = parse_audio(opt.audio_path, del_silence=True) File "C:/Users/ash/Desktop/KoSpeech-master/bin/run_pretrain.py", line 20, in parse_audio mfcc = mfcc[:, ::-1] ValueError: step must be greater than zero

sooftware commented 4 years ago

음 모델과는 별개로 피쳐쪽에서 문제가 생긴 것 같네요.
오디오 파일 경로랑 제대로 설정됐는지, 저 mfcc = mfcc[:, ::-1] 실행전에 print(mfcc)한 결과를 보여주실래요?

Ahnsh95 commented 4 years ago

print(mfcc)한 결과입니다.

D:\anaconda\lib\site-packages\librosa\filters.py:235: UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels. warnings.warn('Empty filters detected in mel frequency basis. ' tensor([[-7.1235e+02, 7.5427e+01, 3.1398e+01, ..., 1.2172e+01, 1.1217e+01, 1.2314e+01], [-7.0468e+02, 6.7169e+01, 8.4292e+00, ..., 8.8065e+00, 1.7831e+01, 2.3363e+01], [-6.8160e+02, 5.9965e+01, -4.1716e-01, ..., 2.5530e+01, 2.3315e+01, 1.6977e+01], ..., [-6.9820e+02, 9.9005e+01, 1.8220e+01, ..., 8.8486e+00, 5.9254e+00, -6.3562e-01], [-7.1192e+02, 9.3892e+01, 1.4745e+01, ..., 1.0872e+01, 1.7573e+01, 1.3130e+01], [-7.1993e+02, 9.3772e+01, 1.9694e+01, ..., 1.2810e+01, 1.1312e+01, 5.3540e+00]]) Traceback (most recent call last): File "C:/Users/ash/Desktop/KoSpeech-master/bin/run_pretrain.py", line 33, in feature_vector = parse_audio(opt.audio_path, del_silence=True) File "C:/Users/ash/Desktop/KoSpeech-master/bin/run_pretrain.py", line 21, in parse_audio mfcc = mfcc[:, ::-1] ValueError: step must be greater than zero

Process finished with exit code 1

sooftware commented 4 years ago

음.. 피쳐는 잘 뽑히는 것 같은데 이상하네요.
오디오 파일을 따로 올려주시면 제가 확인해볼게요.

Ahnsh95 commented 4 years ago

오디오 파일은 데이터셋인 KsponSpeech 파일을 사용하고 있는데 그래도 올릴까요?

sooftware commented 4 years ago

이제 봤네요 죄송합니다.
저희도 mfcc 관련해서 에러가 있는 것 같네요.
다시 확인되는대로 답변 달아놓겠습니다.