I see that there is a potential misalignment between the paper and implemented version of TransE. Can you help clarify?
The misalignment I see is that the loss in the paper is always computed using the normalized head and normalized tail embeddings (based on the pseudo-code). However, in the implementation, despite the entity embeddings being re-normalized at the end of each epoch, after each minibatch a gradient update is made. This means that the loss is not computed for normalized head and tail embeddings. Only for the first minibatch is the loss computed on the normalized embeddings but not for the rest.
Description
I see that there is a potential misalignment between the paper and implemented version of TransE. Can you help clarify?
The misalignment I see is that the loss in the paper is always computed using the normalized head and normalized tail embeddings (based on the pseudo-code). However, in the implementation, despite the entity embeddings being re-normalized at the end of each epoch, after each minibatch a gradient update is made. This means that the loss is not computed for normalized head and tail embeddings. Only for the first minibatch is the loss computed on the normalized embeddings but not for the rest.
Edit: Nevermind. All head and tail vectors are normalized in the scoring function also. This fixes it: https://github.com/torchkge-team/torchkge/blob/d56e9d81101a61d6f2330098d784e87f7a71ce96/torchkge/models/translation.py#L69