Closed Mddct closed 5 months ago
LLM选择softmax的时候保持高精度, 稳定模型在bf16的时候的训练
比如: https://github.com/google/gemma_pytorch/blob/main/gemma/model.py#L288
LLM选择softmax的时候保持高精度, 稳定模型在bf16的时候的训练
比如: https://github.com/google/gemma_pytorch/blob/main/gemma/model.py#L288