train u2++ conformer : AssertionError assert offset + size <= self.max_len

turinaf commented 1 month ago

When training u2++ conformer on custom dataset, I'm encountering this error, it stops after training a while. The train_conformer.yaml worked fine.

/wenet/wenet/transformer/embedding.py", line 102, in position_encoding
      assert offset + size <= self.max_len
  AssertionError

To Reproduce Steps to reproduce the behavior:

create custom dataset following librispeech example (uising wav files, instead of flac)
change config file to train_u2++_conformer.yaml in run.sh, the only thing I changed in yaml file is batch size
run stage 4
See error

Expected behavior Finish training normally

Mddct commented 1 month ago

训练数据太长了，超过emb的最长长度

Turi Abu @.***> 于2024年9月10日周二 11:16写道：

When training u2++ conformer on custom dataset, I'm encountering this error, it stops after training a while. The train_conformer.yaml worked fine.

/wenet/wenet/transformer/embedding.py", line 102, in position_encoding assert offset + size <= self.max_len AssertionError

To Reproduce Steps to reproduce the behavior:

create custom dataset following librispeech example (uising wav files, instead of flac)

change config file to train_u2++_conformer.yaml in run.sh, the only thing I changed in yaml file is batch size

run stage 4

See error

Expected behavior Finish training normally

— Reply to this email directly, view it on GitHub https://github.com/wenet-e2e/wenet/issues/2629, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFN3Q6OSQ3DMYCV3ASJOXLZVZQBFAVCNFSM6AAAAABN5ZIWPKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGUYTKMZQGUYTCOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

turinaf commented 1 month ago

What is the suggested solution? Same training data works for train_conformer.yaml So it's only for U2++? Thanks @Mddct

Mddct commented 1 month ago

需要把每一条训练数据限制在30s以内，或者改大max len

Turi Abu @.***> 于2024年9月10日周二 13:33写道：

What is the suggested solution? @Mddct https://github.com/Mddct

— Reply to this email directly, view it on GitHub https://github.com/wenet-e2e/wenet/issues/2629#issuecomment-2339658575, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFN3Q254WBSGNWS7OU3H23ZV2ADXAVCNFSM6AAAAABN5ZIWPKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZZGY2TQNJXGU . You are receiving this because you were mentioned.Message ID: @.***>

wenet-e2e / wenet

train u2++ conformer : AssertionError assert offset + size <= self.max_len #2629