jfkirk / tensorrec

A TensorFlow recommendation algorithm and framework in Python.
Apache License 2.0
1.27k stars 222 forks source link

Equal recommendations for same user features with different actions #68

Closed rragundez closed 6 years ago

rragundez commented 6 years ago

I think that with the described architecture a user with the same features but different actions will receive the same recommendations, is this correct?

I understand that there is a use case for this, but I just want to get things clear in my understanding.

The only possibility I see is creating a model per user using his/her specific actions.

Have you think about how to overcome this?

jfkirk commented 6 years ago

Hey @rragundez ! Could you clarify for me what you mean by "different actions"? What would some examples of different actions be?

I'm glad you're interested!

rragundez commented 6 years ago

So, suppose I have two users which both are vegetarians, same age, like music etc. same preferences, same user features, but they have interacted with my content items differently. Since the actions get ingested in training time only for the loss function, at inference/prediction time both users get treated equally (since their features are the same)

Is this a bit clearer?

jfkirk commented 6 years ago

Much clearer, thank you!

In this case, you could provide personalization by adding an "indicator feature" - a one-hot-encoded feature that is unique to each user. Practically, I do this by appending an identity matrix (n_users, n_users) to my user metadata matrix.

You can read a bit more about using indicator features in a recommender system in section 2.3 of this paper: https://arxiv.org/pdf/1507.08439.pdf

Does that help?

rragundez commented 6 years ago

I thought about that initially and I came up with a reason of why not to do it, but lost it now. Let me get back to you, I'll take a look at the paper, thanks a lot!

Another question, in your experience with this architecture how many user features and items features and actions would I need to get good results?

rragundez commented 6 years ago

Wouldn't it be a problem that for example if I have 100,000 users and let's say only 10 user features? Seems to me that this would be a problem when calculating the embeddings.

jfkirk commented 6 years ago

The only issue with a large number of indicator features is the size of embeddings in memory. That said, I've routinely built systems with 4mm+ users with indicator features and metadata features without problems.

rragundez commented 6 years ago

great thanks! I'll try it out.