[Question]Retrieval with CategoricalCrossentropy really minimizing the affinity between the query and negative candidates?

tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.

Apache License 2.0

1.82k stars 273 forks source link

[Question]Retrieval with CategoricalCrossentropy really minimizing the affinity between the query and negative candidates? #560

Open canonrock16 opened 1 year ago

canonrock16 commented 1 year ago

In tfrs.tasks.retireval document, retrieval task is explained like The main argument are pairs of query and candidate embeddings: the first row of query_embeddings denotes a query for which the candidate from the first row of candidate embeddings was selected by the user.The task will try to maximize the affinity of these query, candidate pairs while minimizing the affinity between the query and candidates belonging to other queries in the batch. The default loss function of tfrs.tasks.Retrieval is tf.keras.losses.CategoricalCrossentropy. But in CategoricalCrossentropy, the loss of label 0 candidate become 0.

So,

if I use CategoricalCrossentropy for retrieval loss function, the label 1 candidate will affect loss value and embeddings, but the label 0 candidate will not.Is that right?
if I set num_hard_negatives argument with CategoricalCrossentropy loss, the number of negative(label 0)candidates will decrease, but the loss value will not change.Is that right?

maciejkula commented 1 year ago

Yes, it is minimizing it. The cross-entropy loss works through two channels:

The affinity score between the query and the positive item is the numerator of the softmax, and is maximized.
The affinity score between the query and the negative item is in the denominator, and will be minimized.

OmarMAmin commented 1 year ago

@maciejkula Can you elaborate a bit more here? are we back propagating through negative items as well? so if a negative item is frequently selected with a user (i.e. in batch negatives is biased towards popular items) are these items get far away from the user representation after many epochs? or am i missing sth

Thanks

patrickorlando commented 1 year ago

@OmarMAmin, This is true in the case where a log(q) correction is not applied. It turns out that the bias introduced by using in-batch negative sampling can be accounted for by subtracting the natural logarithm of the candidate sampling probability from the output logits. This is implemented in Retrieval task through the call parameter candidate_sampling_probability.

You can read the theoretical details of sampled softmax loss in this paper.

There are also some useful discussions in this thread. https://github.com/tensorflow/recommenders/issues/257

OmarMAmin commented 1 year ago

@patrickorlando thanks for the info, tried using it and it performed better now with the sampling correction :))

Thanks for your invaluable contributions, without these discussions I would have switched to another library directly, will be summarizing the learnings here in another issue for others to benefit from it.

patrickorlando commented 1 year ago

I'm glad I could help @OmarMAmin 😁

That sounds like a great idea.