As introduced in the linked PR for PyTorch, normalizing the vectors first and then doing the inner product could prevent overflow for vectors with large norms. In addition, this could ensure the calculated cosine similarity is within the range [-1, +1].
Related: https://github.com/pytorch/pytorch/pull/31378 Current implementation: https://github.com/rusty1s/pytorch_cluster/blob/dbcafbe6a60aaa631b39050e3aa228f6d3fd1592/csrc/cuda/knn_cuda.cu#L56
As introduced in the linked PR for PyTorch, normalizing the vectors first and then doing the inner product could prevent overflow for vectors with large norms. In addition, this could ensure the calculated cosine similarity is within the range
[-1, +1]
.