wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.13k stars 1.07k forks source link

train u2++ conformer : AssertionError assert offset + size <= self.max_len #2629

Open turinaf opened 1 month ago

turinaf commented 1 month ago

When training u2++ conformer on custom dataset, I'm encountering this error, it stops after training a while. The train_conformer.yaml worked fine.

/wenet/wenet/transformer/embedding.py", line 102, in position_encoding
      assert offset + size <= self.max_len
  AssertionError

To Reproduce Steps to reproduce the behavior:

  1. create custom dataset following librispeech example (uising wav files, instead of flac)
  2. change config file to train_u2++_conformer.yaml in run.sh, the only thing I changed in yaml file is batch size
  3. run stage 4
  4. See error

Expected behavior Finish training normally

Mddct commented 1 month ago

训练数据太长了 , 超过emb的最长长度

Turi Abu @.***> 于2024年9月10日周二 11:16写道:

When training u2++ conformer on custom dataset, I'm encountering this error, it stops after training a while. The train_conformer.yaml worked fine.

/wenet/wenet/transformer/embedding.py", line 102, in position_encoding assert offset + size <= self.max_len AssertionError

To Reproduce Steps to reproduce the behavior:

  1. create custom dataset following librispeech example (uising wav files, instead of flac)
  2. change config file to train_u2++_conformer.yaml in run.sh, the only thing I changed in yaml file is batch size
  3. run stage 4
  4. See error

Expected behavior Finish training normally

— Reply to this email directly, view it on GitHub https://github.com/wenet-e2e/wenet/issues/2629, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFN3Q6OSQ3DMYCV3ASJOXLZVZQBFAVCNFSM6AAAAABN5ZIWPKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGUYTKMZQGUYTCOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

turinaf commented 1 month ago

What is the suggested solution? Same training data works for train_conformer.yaml So it's only for U2++? Thanks @Mddct

Mddct commented 1 month ago

需要把每一条训练数据限制在30s以内, 或者改大max len

Turi Abu @.***> 于2024年9月10日周二 13:33写道:

What is the suggested solution? @Mddct https://github.com/Mddct

— Reply to this email directly, view it on GitHub https://github.com/wenet-e2e/wenet/issues/2629#issuecomment-2339658575, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFN3Q254WBSGNWS7OU3H23ZV2ADXAVCNFSM6AAAAABN5ZIWPKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZZGY2TQNJXGU . You are receiving this because you were mentioned.Message ID: @.***>