Open turinaf opened 1 month ago
训练数据太长了 , 超过emb的最长长度
Turi Abu @.***> 于2024年9月10日周二 11:16写道:
When training u2++ conformer on custom dataset, I'm encountering this error, it stops after training a while. The train_conformer.yaml worked fine.
/wenet/wenet/transformer/embedding.py", line 102, in position_encoding assert offset + size <= self.max_len AssertionError
To Reproduce Steps to reproduce the behavior:
- create custom dataset following librispeech example (uising wav files, instead of flac)
- change config file to train_u2++_conformer.yaml in run.sh, the only thing I changed in yaml file is batch size
- run stage 4
- See error
Expected behavior Finish training normally
— Reply to this email directly, view it on GitHub https://github.com/wenet-e2e/wenet/issues/2629, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFN3Q6OSQ3DMYCV3ASJOXLZVZQBFAVCNFSM6AAAAABN5ZIWPKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGUYTKMZQGUYTCOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
What is the suggested solution? Same training data works for train_conformer.yaml
So it's only for U2++?
Thanks
@Mddct
需要把每一条训练数据限制在30s以内, 或者改大max len
Turi Abu @.***> 于2024年9月10日周二 13:33写道:
What is the suggested solution? @Mddct https://github.com/Mddct
— Reply to this email directly, view it on GitHub https://github.com/wenet-e2e/wenet/issues/2629#issuecomment-2339658575, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFN3Q254WBSGNWS7OU3H23ZV2ADXAVCNFSM6AAAAABN5ZIWPKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZZGY2TQNJXGU . You are receiving this because you were mentioned.Message ID: @.***>
When training u2++ conformer on custom dataset, I'm encountering this error, it stops after training a while. The
train_conformer.yaml
worked fine.To Reproduce Steps to reproduce the behavior:
train_u2++_conformer.yaml
inrun.sh
, the only thing I changed in yaml file is batch sizeExpected behavior Finish training normally