Closed flp1990 closed 11 months ago
Sorry, we do not have the badwidth to do it recently.
Sorry, we do not have the badwidth to do it recently.
Ok, thanks.
I think for accum-grad, if you only have one gpu you can multiply it by 8 (gpus) compared to the value in original config.
you can try to remove positional emb
Hi, I used Wenet to run the conformer-transformer experiment in librispeech 100h with the following settings with one RTX TITAN 24G . The results are as follows(70 epoches):
"dev_clean_attention_rescoring: English -> 14.17 % N=54402 C=47534 S=6046 D=822 I=843 dev_other_attention_rescoring:English -> 30.81 % N=50948 C=36937 S=11930 D=2081 I=1686 test_clean_attention_rescoring:English -> 14.21 % N=52576 C=45963 S=5806 D=807 I=857 dev_other_attention_rescoring:English -> 35.11 % N=52343 C=38074 S=11132 D=3137 I=4107"
This result is far lower than the model with the same parameters I ran on Espnet before on the librispeech 100h dataset. (https://github.com/espnet/espnet/tree/master/egs2/librispeech_100/asr1)
Can you add an experiment result of a single card on librispeech100h? If you have time to add this experiment, thank you very much.
network architecture
encoder related
encoder: conformer encoder_conf: output_size: 256 # dimension of attention attention_heads: 4 linear_units: 2048 # the number of units of position-wise feed forward num_blocks: 12 # the number of encoder blocks dropout_rate: 0.1 positional_dropout_rate: 0.1 attention_dropout_rate: 0.0 input_layer: conv2d # encoder input type, you can chose conv2d, conv2d6 and conv2d8 normalize_before: true cnn_module_kernel: 15 use_cnn_module: True activation_type: 'swish' pos_enc_layer_type: 'rel_pos' selfattention_layer_type: 'rel_selfattn'
decoder related
decoder: transformer decoder_conf: attention_heads: 4 linear_units: 2048 num_blocks: 6 dropout_rate: 0.1 positional_dropout_rate: 0.1 self_attention_dropout_rate: 0.0 src_attention_dropout_rate: 0.0
hybrid CTC/attention
model_conf: ctc_weight: 0.3 lsm_weight: 0.1 # label smoothing option length_normalized_loss: false
dataset related
dataset_conf: filter_conf: max_length: 2000 min_length: 50 token_max_length: 400 token_min_length: 1 min_output_input_ratio: 0.0005 max_output_input_ratio: 0.1 resample_conf: resample_rate: 16000 speed_perturb: true fbank_conf: num_mel_bins: 80 frame_shift: 10 frame_length: 25 dither: 0.0 spec_aug: true spec_aug_conf: num_t_mask: 2 num_f_mask: 2 max_t: 50 max_f: 10 shuffle: true shuffle_conf: shuffle_size: 1500 sort: true sort_conf: sort_size: 500 # sort_size should be less than shuffle_size batch_conf: batch_type: 'static' # static or dynamic batch_size: 12
grad_clip: 5 accum_grad: 1 max_epoch: 70 log_interval: 100
optim: adam optim_conf: lr: 0.004 scheduler: warmuplr # pytorch v1.1.0+ required scheduler_conf: warmup_steps: 25000