Using TF-Ranking for implicit feedback recommender system

mbw314 commented 5 years ago

I'm interested in using TF-Ranking to create a recommender system based on collaborative filtering. There are a few million users, a few hundred thousand items, and a few hundred million user/item interactions. I'm also treating the interactions as "implicit feedback" -- any interaction in the dataset is "positive" and anything else is "negative". I have used LightFM successfully for this task in the past (but with smaller datasets), so as a starting point I'm trying to replicate its functionality.

I have looked at the TF-Ranking tutorial for sparse features and it seems highly relevant for my needs:

I want to use sparse features, starting with user and item IDs
I want to generate low-dimensional user/item embeddings
I want custom logic for prediction, e.g., dot product of user/item vector can be implemented in scoring_fn

My primary remaining concern is whether implicit feedback recommender systems are possible with the loss functions currently implemented in TF-Ranking. It seems that negative interactions are very important in learning optimal rankings, but I see that in-batch negative sampling is not supported, and I'm not familiar enough with the existing loss functions to tell if they are suitable. For example, I know that adding negative examples to the dataset is one option, but I'm not sure what is appropriate loss-wise.

I would appreciate any feedback here. (I'd also like to register my support for in-batch negative sampling to allow for loss functions such as WMRB.)

eggie5 commented 5 years ago

All of our bullet points are easily accessible w/ TFR: user/item embeddings and dot product.

You can implement LightFM BPR loss in TFR by using the pairwise logistic loss. Negative sampling will not be across the whole corpus like naive BPR, but will be from the same query as the positive item. A core data structure of TF is the query, so this might not fit into your paradigm. However, I find it is actually intuitive, especially for some use cases where you don't want naive neg samples.

It seems like from your message, that you data isn't from search query logs. So you'll have to retrofit your interactions into query. For example 1 query below w/ conversion on item 3:

user_id: uid,
documents: {
item_id: iid, rel: 0
item_id: iid, rel: 0
item_id: iid, rel: 1
item_id: iid, rel: 0
item_id: iid, rel: 0
item_id: iid, rel: 0
item_id: iid, rel: 0
item_id: iid, rel: 0
}

where relevance is taken from you interaction data. If you use pairwise logistic loss, it'll take the 3rd item and use all the other items as negative samples and do BPR.

Your comments on WARP, rather WMRB are interesting also. I think WMBR is motivated by the fact that WARP only works in the fully stochastic setting (no mini-batches) and that it preforms a little better than BPR on some baselines. I implemented WMBR a while back in Keras, and if I recall correctly it's just a weighted hinge loss right? You should be able to implement it as a custom loss, something like this maybe:

def WMBR_loss(X):
    xui, xuj = (X)  # this is a batch of positive and negative item predictions/scores for for users

    Z = 1  # number of samples in batch
    Y = len(dataset.items) #total number of items in dataset

    margin = 1.0 - xui + xuj  # N margins, where N is the batch size
    margin_rank = K.maximum(0.0, margin)  # max of the margins (relu linear rectifier)
    sampled_margin_rank = (Y / Z) * margin_rank  # turn into vector from tensor
    loss = K.log(sampled_margin_rank + 1)  # log the margins for differentiable surface

    return loss

mbw314 commented 5 years ago

@eggie5, thanks for your feedback! The TF-Ranking notion of a query does not map exactly onto my use case (e.g., my training data is not query logs, as you suspected), but I think I understand it better now and I can probably make it work based on what you described.

For example, I had initially thought to put all of a given user's positive items into a single query for training, padded appropriately, but it sounds like I should create many queries for each user, each with negative items included. Regarding the structure of those queries, your example had one positive item per query. Was that specific to BPR, or are multiple positive items per query also appropriate?

eggie5 commented 5 years ago

the one positive per query example was for BPR specifically where a 1 is the positive item and 0 is the negatives. In TFR, you can have multiple positives per query, for example, a 1 can be a click and a 2 can be a conversion. Then the pairwise loss will make more pairs w/ the negatives in the query.

bendersky commented 5 years ago

@eggie5 -- thank you for your response. It summarizes well how TF-Ranking can be adapted to a recommendation setting. @mbw314 -- please let us know if you have any questions or findings when using TF-Ranking for your setting.

tensorflow / ranking

Using TF-Ranking for implicit feedback recommender system #110