spotify-research / cosernn

Code for the paper "Contextual and Sequential User Embeddings for Large-Scale Music Recommendation".
Apache License 2.0
33 stars 8 forks source link

how are the predicted and the observed user embeddings unit-norm? #5

Closed animesh-wynk closed 3 months ago

animesh-wynk commented 5 months ago

Loved the paper! :D

I have a few doubts though. In the section 5.2.1, titled "Loss function & Optimization" in the paper, it is mentioned that the model is trained to maximize the cosine similarity between the predicted embedding 𝒖𝑡 and the observed one 𝒔𝑡, because both embeddings are unit-norm. But how exactly are these 2 embeddings unit-norm?

Observed user embedding is the average of all the song embeddings present in the user session. The song embeddings are unit-norm, but when we take their average, the resultant need not be unit-norm.

Same logic applies to the predicted user embedding.

@lucasmaystre I would really appreciate if you could clarify my doubt. Thanks : )

lucasmaystre commented 5 months ago

Hi @animesh-wynk thanks for your interest!

Observed user embedding is the average of all the song embeddings present in the user session.

If you look at Eq 1. and the text below, the user embedding is proportional to the average, but is actually normalized to be unit-norm.

Re-reading the paper, it does seem that \bm{u}_t as defined just above 5.2 is no longer unit norm. However, looking at the code, it seems that we do indeed normalize before taking the dot-product. So all in all we do maximize the cosine similarity.

animesh-wynk commented 5 months ago

thanks a lot for the clarification! : )