Closed ghost closed 3 years ago
Hi! This project focused on Korean ASR. Previously, I had also created code to support LibriSpeech, but it failed to update to suit LibriSpeech due to the recent high number of code updates.Sorry for inconvenience.
Hello,
Thx for the super fast reply.
No worries for librispeech, if I manage to make it work I will just issue a pull request.
Best regards,
Dan
LibriSpeechVocabulary does not have a self.blank_id instance param
Description
When running on LibriSpeech, conformer-small, subwords units, bug Manifests as:
[To reproduce the bug]
```bash python ./bin/main.py \ audio=melspectrogram \ model=conformer-small \ train=conformer_small_train \ audio.audio_extension=flac \ train.dataset_path=/home/jupyter/LibriSpeech/ \ train.transcripts_path=/home/dan/KoSpeech/data/train.txt \ audio_extension=flac \ audio.feature_extract_by=torchaudio \ train.dataset=libri ```
[Full Hydra config]
```yaml audio: audio_extension: flac sample_rate: 16000 frame_length: 20 frame_shift: 10 normalize: true del_silence: true feature_extract_by: torchaudio time_mask_num: 4 freq_mask_num: 2 spec_augment: true input_reverse: false transform_method: mel n_mels: 80 freq_mask_para: 18 audio_extension: flac transform_method: mel sample_rate: 16000 frame_length: 20 frame_shift: 10 n_mels: 80 normalize: true del_silence: true feature_extract_by: kaldi freq_mask_para: 18 time_mask_num: 4 freq_mask_num: 2 spec_augment: true input_reverse: false model: architecture: conformer teacher_forcing_ratio: 1.0 teacher_forcing_step: 0.01 min_teacher_forcing_ratio: 0.9 dropout: 0.3 bidirectional: false joint_ctc_attention: false max_len: 400 feed_forward_expansion_factor: 4 conv_expansion_factor: 2 input_dropout_p: 0.1 feed_forward_dropout_p: 0.1 attention_dropout_p: 0.1 conv_dropout_p: 0.1 decoder_dropout_p: 0.1 conv_kernel_size: 31 half_step_residual: true num_decoder_layers: 1 decoder_rnn_type: lstm decoder: None encoder_dim: 144 decoder_dim: 320 num_encoder_layers: 16 num_attention_heads: 4 architecture: conformer teacher_forcing_step: 0.0 min_teacher_forcing_ratio: 1.0 joint_ctc_attention: false feed_forward_expansion_factor: int = 4 conv_expansion_factor: 2 input_dropout_p: 0.1 feed_forward_dropout_p: 0.1 attention_dropout_p: 0.1 conv_dropout_p: 0.1 decoder_dropout_p: 0.1 conv_kernel_size: 31 half_step_residual: true encoder_dim: 144 decoder_dim: 320 num_encoder_layers: 16 num_decoder_layers: 1 num_attention_heads: 4 decoder: None train: dataset: libri dataset_path: /home/dan/LibriSpeech/ transcripts_path: /home/dan/KoSpeech/data/train.txt output_unit: character batch_size: 32 save_result_every: 1000 checkpoint_every: 5000 print_every: 10 mode: train num_workers: 4 use_cuda: true init_lr_scale: 0.01 final_lr_scale: 0.001 max_grad_norm: 400 weight_decay: 1.0e-06 seed: 777 resume: false optimizer: adam reduction: mean lr_scheduler: transformer_lr_scheduler optimizer_betas: - 0.9 - 0.98 optimizer_eps: 1.0e-09 warmup_steps: 10000 decay_steps: 80000 peak_lr: 0.0001 final_lr: 1.0e-07 num_epochs: 20 ```
Causes
int blank
of ctc_loss is None instead of int, comes from vocab.blank_idPossible Fixes
RuntimeError: blank must be in label range
error on criterion usage.I am motivated to make a pull request if you decide a fix strategy, don't hesitate to notify me if you want me to handle it.
Best regards,
Dan Ringwald dan.ringwald12@gmail.com