[Question] How to flatten the normalised distribution of scores?

tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.

Apache License 2.0

1.79k stars 267 forks source link

[Question] How to flatten the normalised distribution of scores? #611

Open ydennisy opened 1 year ago

ydennisy commented 1 year ago

Hi!

When training a TF recommender, my final cosine distance between all users items creates a very peaky narrow distribution around 0. See the following chart:

I can apply a transformer on the scores outside of the model later, but I was wondering is there a way to try and guide the model towards a flatter distribution? My intuition was that increasing softmax to 2 would provide such an effect but it made no difference.

Any help appreciated! cc @maciejkula

maciejkula commented 1 year ago

In a retrieval model the distribution of the dot product scores is determined solely via the query-candidate softmax, and there is no real meaning to their magnitude or variance: they are only meaningful after the softmax normalization. Even then, softmaxes are famously uncalibrated.

If you care a particular distribution, I would recommend adding an auxiliary loss to your model that would penalize the model from deviating from the distribution you want.

ydennisy commented 1 year ago

@maciejkula that is very interesting! Thanks for you input as always extremely helpful!

Would you mind guiding me just a little bit more into what sort of loss I could use, have you seen such a loss before? Or do you think I most likely need to write a custom loss function?

patrickorlando commented 1 year ago

Perhaps @ydennisy, you could calculate the mean and variance of your batch of scores and calculate a loss based on your desired mean and variance. You could add this into the compute_loss() function of your model and you'll need to find some scale factor when adding it to the retrieval loss.

The scores are calculated via matrix multiplication in the retrieval task, so you might want to create your own modified task to avoid doing the calculation twice.

JV-Nunes commented 8 months ago

In a retrieval model the distribution of the dot product scores is determined solely via the query-candidate softmax, and there is no real meaning to their magnitude or variance: they are only meaningful after the softmax normalization. Even then, softmaxes are famously uncalibrated.

If you care a particular distribution, I would recommend adding an auxiliary loss to your model that would penalize the model from deviating from the distribution you want.

@maciejkula The softmax normalization is already applied on the retrieval score output or i need to apply this normalization after?