lucidrains / nGPT-pytorch

Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
MIT License
201 stars 10 forks source link

shouldn't embeding be normalized along embed dimension? #9

Open jfpuget opened 6 hours ago

jfpuget commented 6 hours ago

In https://github.com/lucidrains/nGPT-pytorch/blob/main/nGPT_pytorch/nGPT.py#L350 token_embed is normalized along last dimension when it should be normalized along the first dimension.

lucidrains commented 6 hours ago

hey Jean-Francois, it is actually defaulted to first dimension here