huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

TritonRMSNorm generates randomized results during inference #138

Closed zzhhjjj closed 5 months ago

zzhhjjj commented 5 months ago

I found this out while writing tests. If we want stable results during inference, use GemmaRMSNorm instead. https://github.com/huggingface/nanotron/blob/450fb67e5250482006b5d001c3e9b238ae3fbd1a/src/nanotron/models/llama.py#L610-L618