srendle / libfm

Library for factorization machines
GNU General Public License v3.0
1.49k stars 414 forks source link

feature design and test set values: "Other Movies Rated" #37

Open sharonwoo opened 5 years ago

sharonwoo commented 5 years ago

Dear all,

Hoping to get some insight into feature design here and check my understanding is correct, as I am new to FMs.

In the original Factorization Machines paper in 2010, the "Other Movies Rated" feature contains normalised values for all the other movies the user has ever rated.

Let's use the user Alice in the example, and assume the example covers the training set. We see she's rated 3 movies: NH, TI, and SW. Since there are 3 movies, the "Other Movies Rated" columns have values of (0.3, 0.3, 0.3, 0...).

Say in my test set, Alice has rated ST (Star Trek) with a target of 1. In my "Other Movies Rated" columns in the test set, should I use (0.25, 0.25, 0.25, 0.25 ...), with the fourth value updated for Alice's rating of ST? Or should I use (0.3, 0.3, 0.3, 0...), similar to the training set?

Thanks in advance! Apologies if this question has been asked elsewhere, I haven't been able to find a conclusive answer.

chihming commented 5 years ago

I consider the rationale behind that design is to learn the rated_movies-to-target_movie relations. So, for the setting (0.25, 0.25, 0.25, 0.25 ...), the task becomes to predict the rate (the exact score) when we know Alice has rated ST and the others. For the setting (0.3, 0.3, 0.3, 0...), the task becomes to predict how Alice would rate ST given her previous rating behavior (ST excluded).