wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.08k stars 1.07k forks source link

[transformer] keep high precisioin in softmax #2508

Closed Mddct closed 5 months ago

Mddct commented 5 months ago

LLM选择softmax的时候保持高精度, 稳定模型在bf16的时候的训练

比如: https://github.com/google/gemma_pytorch/blob/main/gemma/model.py#L288