lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.66k stars 679 forks source link

How to validate the Dataset.mapping #670

Open marcosvliras opened 1 year ago

marcosvliras commented 1 year ago

Using the same example from https://making.lyst.com/lightfm/docs/examples/dataset.html#building-the-id-mappings

How could I validade the mapping of each item feature?

When I call user_id_map, u_f_map, item_id_map, i_f_map = dataset.mapping()

after this item_features = dataset.build_item_features(((x['ISBN'], [x['Book-Author']]) for x in get_book_features()))

I got this result from item_id_map

{'034545104X': 0,
 '0155061224': 1,
 '0446520802': 2,
 '052165615X': 3...} 

Now, I know that '034545104X' is mapped as 0. Looking at item_features built before I got this

  (0, 0)    0.5
  (0, 343789)   0.5
  (1, 1)    0.5
  (1, 428522)   0.5
  (2, 2)    0.5
  (2, 341954)   0.5
  (3, 3)    0.5 .......

Looking at book_features as a pandas dataframe dede

As seen, getting the item mapped as 0 which is the item 034545104X. And filtering the dataframe, for this item, the book-author is Flesh Tones: A Novel

But When I do this

item_feature_inverse_map = {v:k for k, v in i_f_map.items()}
print(item_feature_inverse_map[343789])

The result is 'M. J. Rose' which is different of Flesh Tones: A Novel.

saba-zones commented 4 months ago

Hi @marcosvliras. Were you able to run validations for these mappings on your dataset? If yes, how did you do it? I am also using lightfm dataset mappings, and want to do a similar thing. I need some tips on validating it with our original data/dataframe.