Open momassimo opened 4 years ago
User and item latent vectors are initialized with random numbers, so there is no guarantee that two users who interacted with the same items will have the same latent vectors. Even if they were initialized with the same latent vectors, lightfm performs stochastic gradient descent with a batch size of 1. Various parameters can change between batches, like the effective learning rate and the item latent vectors, so the two users' latent vectors will get updated differently during SGD.
Hi Ethan,
thanks so much for the fast and good reply. That explains my question now perfectly.
An issue I still have is the following: Is there any way to interpret the embeddings? I have e.g. the case that the model says two customers are very "near" to each other. When I take a closer look to those customers, they have just one bought product in common and also the user features are not same. This would be really nice to give more insight on the question "why customers are considered to be similiar?"
thanks in advance!
Moritz
Unfortunately, interpreting the embeddings is pretty difficult, and I can't offer much advice beyond what you've already looked at (i.e. products and user features in common).
One thing to make sure of when using user features is that you're comparing two users' similarity by their user representation, and not their user embedding. In case you don't know, the LightFM user representation is the sum of all of the user's user feature embeddings (including the user's user identity feature, if you are using that).
If you're using the user representation and still finding that two users who are nearest neighbors have few features and products in common, then maybe the model is poorly tuned?
Thanks!
As far as I understood the library with get_user_representation
you also get the user_embeddings
(and the user_biases
), aren't they the same like model.user_embeddings
? Or did I misunderstand you?
Ah, I left out something in my explanation. You must provide the user feature
matrix as an argument to get_user_representations()
. While you can also use model.user_embeddings
, you have to make sure that you add up all of the user's embeddings prior to calculating similarity with other users. I'm not also not sure if you missed this or not, so I'll walk through an example below just in case it's helpful!
Imagine you use the Dataset class to build both your interactions matrix and your user and item features. You set user_identity_features=True
, and you have two other user features: device_is_ios
and device_is_android
, and each of these features can be 1
or 0
.
If you build your user feature matrix, it will have shape num_users, num_user_features
where num_user_features = num_users + 2
. This is because you are building a unique user feature for each user as well as the 2 extra features. This also means that your user_embedding
matrix will have shape num_user_features, num_components
. That is, you get an embedding for each unique user feature and the extra device_is_*
features.
So, when you want to calculate a user's "representation" in order to calculate similarity, you need to add up both the user's unique embedding and their device_is_*
embedding together.
Thanks for all the explanations, Ethan!
:raised_hands:
@EthanRosenthal
Thank u Ethan!.
When i am using items_features, the model has lower precision compare to pure CF.
In your comment you mention devise_is_*
- Does it must to be one hot format?
For example,
my items data:
article_id section_primary writer_name
0 1.9134852 culture אפרת רובינשטיין
1 1.9141164 culture אורון שמיר
2 1.9179619 culture דייב איצקוף
So, I am building the features as
item_features = dataset.build_item_features([(i.article_id,[i.section_primary,i.writer_name]) for i in items.itertuples()])
Thanks!
Hi,
I created a model for a retailer with 38k customers and 36k articles and a sparsity of 0,55%. One part of the analysis is to find similiar customers, to get the "neighbours" I took the latent features of the users and checked the dot product between them (similiar to the LightFM example). In order to get a better understanding of my model I created some test customers and calculated their similarity. Two of those test customers are (besides a different name) exactly the same. Both bought the same (one) article ones and are in the same industry (a user feature). In my understanding they should have the same vectors and therefore a dot product of 1. But unfortunately the vectors are different and the dot product is -0.057281606.
Does somebody have an explanation how this can happen?
Thanks in advance!
Best, Moritz
Those are the normalized vectors: Testuser 1:
Testuser 2:
That's how I built the dataset:
some translation: "Artikelnummer" = "item number", "Hauptkundennummer" = "Customer number", "Warengruppe" = "Product group", "Branchenschlüssel" = "Industry Code"