Open ydennisy opened 1 year ago
In a retrieval model the distribution of the dot product scores is determined solely via the query-candidate softmax, and there is no real meaning to their magnitude or variance: they are only meaningful after the softmax normalization. Even then, softmaxes are famously uncalibrated.
If you care a particular distribution, I would recommend adding an auxiliary loss to your model that would penalize the model from deviating from the distribution you want.
@maciejkula that is very interesting! Thanks for you input as always extremely helpful!
Would you mind guiding me just a little bit more into what sort of loss I could use, have you seen such a loss before? Or do you think I most likely need to write a custom loss function?
Perhaps @ydennisy, you could calculate the mean and variance of your batch of scores and calculate a loss based on your desired mean and variance.
You could add this into the compute_loss()
function of your model and you'll need to find some scale factor when adding it to the retrieval loss.
The scores are calculated via matrix multiplication in the retrieval task, so you might want to create your own modified task to avoid doing the calculation twice.
In a retrieval model the distribution of the dot product scores is determined solely via the query-candidate softmax, and there is no real meaning to their magnitude or variance: they are only meaningful after the softmax normalization. Even then, softmaxes are famously uncalibrated.
If you care a particular distribution, I would recommend adding an auxiliary loss to your model that would penalize the model from deviating from the distribution you want.
@maciejkula The softmax normalization is already applied on the retrieval score output or i need to apply this normalization after?
Hi!
When training a TF recommender, my final cosine distance between all users items creates a very peaky narrow distribution around 0. See the following chart:
I can apply a transformer on the scores outside of the model later, but I was wondering is there a way to try and guide the model towards a flatter distribution? My intuition was that increasing softmax to
2
would provide such an effect but it made no difference.Any help appreciated! cc @maciejkula