[Question] Embeddings normalization by sqrt(hidden_size)

google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models

https://ai.google.dev/gemma

Apache License 2.0

5.19k stars 492 forks source link

Closed Andrei-Aksionov closed 4 months ago

Andrei-Aksionov commented 4 months ago

Hello there 👋

Unfortunately, I cannot find an answer anywhere.

ghost commented 4 months ago

Andrei-Aksionov commented 4 months ago

Thanks @crolequi for the response. In the paper it just stated that they multiplied weights, but didn't explain why exactly.

suryabhupa commented 4 months ago

Andrei-Aksionov commented 4 months ago

Thanks @suryabhupa for the link. It helped a lot.