Open gombru opened 3 years ago
The negatives are usually things that the user saw but did not interact with. Do you have data on that?
If you don't, the likely the retrieval model is the right choice for you. Could you add some detail about why you had to move away from it?
Thanks for helping.
No I don't have negatives examples. My idea was to use random assets (the ones in the batch) as negatives, as the retrieval model objective implicitly does. Either using a ranking loss or the default MSE loss.
I still haven't decided to move to a ranking model, but these are the reasons why I'm considering it:
In my application I need to filter the candidates set for each query: i.e. keep only assets with a given flag. That could also be done with a retrieval model by: a) Retrieving a huge number of recommendations and post-processing them. b) Generating the candidates embedings for the custom set for each query. But both would harm the performance of the retrieval model fast inference methods (i.e. ScaNN), and the dataset is huge.
Ranking models have more capacity to learn user-asset features interactions.
I have pre-built simple functionality to generate a set of candidates to be scored by a ranking model.
What are your thoughts on this? Thanks!
Any update on this @maciejkula? Would be very helpful! Thanks!
I have a dataset consisting of users positive interactions with assets. Before I was using a Retrieval Model, whose objective considers as positive label a given user-asset pair, and as negatives the user paired with the rest of the assets in the batch. But I have to move to a Ranking model due to non static dataset requirements.
However, the ranking model objective does not do that. Instead, it requires a target label for each pair. Given my dataset (and following the movies example) I should fill that tensor with 1s. But then, I wouldn't have negative examples, and the model would end up predicting 1s for all datapoints.
Is there a simple way to use in a Ranking model the rest of elements within the batch as negatives? What is the best setup to train a Ranking model with this king of data?
Thanks