Closed zzhhjjj closed 5 months ago
I found this out while writing tests. If we want stable results during inference, use GemmaRMSNorm instead. https://github.com/huggingface/nanotron/blob/450fb67e5250482006b5d001c3e9b238ae3fbd1a/src/nanotron/models/llama.py#L610-L618
I found this out while writing tests. If we want stable results during inference, use GemmaRMSNorm instead. https://github.com/huggingface/nanotron/blob/450fb67e5250482006b5d001c3e9b238ae3fbd1a/src/nanotron/models/llama.py#L610-L618