[transformer] Make MoE runnable

wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

https://wenet-e2e.github.io/wenet/

Apache License 2.0

3.87k stars 1.04k forks source link

[transformer] Make MoE runnable #2474

Closed xingchensong closed 2 months ago

xingchensong commented 2 months ago

How to use :

xingchensong commented 2 months ago

离线测试 test_grad_ckpt.py 会卡住，所以先删除了

Mddct commented 2 months ago

离线测试 test_grad_ckpt.py 会卡住，所以先删除了

卡住的原因是这个： https://github.com/wenet-e2e/wenet/pull/2473/files 😭 revert 回去就好了

xingchensong commented 2 months ago

离线测试 test_grad_ckpt.py 会卡住，所以先删除了

卡住的原因是这个： https://github.com/wenet-e2e/wenet/pull/2473/files 😭 revert 回去就好了

ok，你搞下

Mddct commented 2 months ago

离线测试 test_grad_ckpt.py 会卡住，所以先删除了

卡住的原因是这个： https://github.com/wenet-e2e/wenet/pull/2473/files 😭 revert 回去就好了

ok，你搞下

https://github.com/wenet-e2e/wenet/pull/2477

Mddct commented 2 months ago

decoder_conf:
  attention_heads: 4
  dropout_rate: 0.1
  linear_units: 512
  mlp_type: moe
  n_expert: 4
  n_expert_activated: 2
  num_blocks: 6
  positional_dropout_rate: 0.1
  self_attention_dropout_rate: 0.0
  src_attention_dropout_rate: 0.0
dtype: fp32
encoder: conformer
encoder_conf:
  activation_type: swish
  attention_dropout_rate: 0.0
  attention_heads: 4
  cnn_module_kernel: 15
  dropout_rate: 0.1
  input_layer: conv2d
  linear_units: 512
  mlp_type: moe
  n_expert: 4
  n_expert_activated: 2
  normalize_before: true
  num_blocks: 12
  output_size: 256
  pos_enc_layer_type: rel_pos
  positional_dropout_rate: 0.1
  selfattention_layer_type: rel_selfattn
  use_cnn_module: true

108761713497819_ pic