zenogantner / MyMediaLite

recommender system library for the CLR (.NET)
http://mymedialite.net
502 stars 192 forks source link

support to Koren's neighborhood model and integrated model #376

Open mmanzato opened 11 years ago

mmanzato commented 11 years ago

This pull request is related only to Koren's models. So please, consider only the commit 879654b and new ones (if exist).

zenogantner commented 11 years ago

Hi, what kind of results did you get with KorenImplicitKNN? I get an RMSE of 1.13 for

bin/rating_prediction --training-file=u.data --test-ratio=0.5 --recommender=KorenImplicitKNN --recommender-options="num_factors=5" --data-dir=data/ml-100k --find-iter=1

which is not particularly good.

IntegratedSVDPlusPlusKNN seems to converge nicely.

Did you calibrate the methods against what was reported in the paper?

mmanzato commented 11 years ago

Hello Zeno,

Currently I am on vacation, but I will check as soon as I come back to work.

Regards!

Em sexta-feira, 25 de janeiro de 2013, Zeno Gantner escreveu:

Hi, what kind of results did you get with KorenImplicitKNN? I get an RMSE of 1.13 for

bin/rating_prediction --training-file=u.data --test-ratio=0.5 --recommender=KorenImplicitKNN --recommender-options="num_factors=5" --data-dir=data/ml-100k --find-iter=1

which is not particularly good.

IntegratedSVDPlusPlusKNN seems to converge nicely.

Did you calibrate the methods against what was reported in the paper?

— Reply to this email directly or view it on GitHubhttps://github.com/zenogantner/MyMediaLite/pull/376#issuecomment-12718590.


Prof. Marcelo G. Manzato Computer Science Department (SCC) Mathematics and Computer Science Institute (ICMC) University of Sao Paulo (USP) Sao Carlos-SP Brazil

+55 16 3373 6638

mmanzato commented 11 years ago

Hello Zeno!

Sorry for the delay, I was busy with other things here at the University.

I checked the code, and realize a mistake within the KorenImplicitKNN class. Actually a single line was missing at the beginning of the InitNeighborhoodMode() method:

current_learnrate = LearnRate;

The current_learnrate variable was initialized with 0, and because of that, all parameters updates were useless.

Now I run again and it seems to be converging (alhough I still can't figure out best parameters values for it):

rating_prediction --measures=RMSE,MAE --training-file=/home/manzato/Databases/ml-100k/u.data --recommender=KorenImplicitKNN --test-ratio=0.2 --rating-type=byte --recommender-options="num_factors=5 learn_rate=0.01 bias_reg=0.01 reg=0.5 num_iter=40 frequency_regularization=true K=30 decay=0.9" --find-iter=1 loading_time 0.2 memory 1 ratings range: [1, 5] test ratio 0.2 training data: 943 users, 1659 items, 80000 ratings, sparsity 94.88634 test data: 941 users, 1412 items, 20000 ratings, sparsity 98.49476 KorenImplicitKNN K=30 regularization=0.5 bias_reg=0.01 frequency_regularization=True learn_rate=0.01 bias_learn_rate=0.7 num_iter=0 decay=0.9 RMSE 1.123809 MAE 0.940299 new items: RMSE 1.79243 MAE 1.57211 CBD 0.36482 iteration 0 RMSE 0.9861695 MAE 0.7927129 new items: RMSE 1.14565 MAE 0.92523 CBD 0.25277 iteration 1 RMSE 0.9641793 MAE 0.7687995 new items: RMSE 1.09936 MAE 0.88151 CBD 0.24551 iteration 2 RMSE 0.9552662 MAE 0.7598233 new items: RMSE 1.08562 MAE 0.87425 CBD 0.24386 iteration 3 RMSE 0.950482 MAE 0.7551695 new items: RMSE 1.08032 MAE 0.87287 CBD 0.24332 iteration 4 RMSE 0.9475589 MAE 0.7523091 new items: RMSE 1.0783 MAE 0.87301 CBD 0.24316 iteration 5 RMSE 0.9456433 MAE 0.7504052 new items: RMSE 1.07776 MAE 0.87364 CBD 0.24314 iteration 6 RMSE 0.9443214 MAE 0.7490677 new items: RMSE 1.07789 MAE 0.87443 CBD 0.2432 iteration 7 RMSE 0.9433836 MAE 0.7481087 new items: RMSE 1.07834 MAE 0.87522 CBD 0.24328 iteration 8 RMSE 0.9427018 MAE 0.7474058 new items: RMSE 1.07892 MAE 0.87597 CBD 0.24336 iteration 9 RMSE 0.942196 MAE 0.7468699 new items: RMSE 1.07954 MAE 0.87665 CBD 0.24345 iteration 10 RMSE 0.9418148 MAE 0.7464592 new items: RMSE 1.08015 MAE 0.87725 CBD 0.24353 iteration 11 RMSE 0.9415229 MAE 0.7461361 new items: RMSE 1.08072 MAE 0.87779 CBD 0.2436 iteration 12 RMSE 0.9412969 MAE 0.7458754 new items: RMSE 1.08126 MAE 0.87827 CBD 0.24367 iteration 13 RMSE 0.9411201 MAE 0.7456625 new items: RMSE 1.08175 MAE 0.87869 CBD 0.24373 iteration 14 RMSE 0.9409808 MAE 0.7454898 new items: RMSE 1.08219 MAE 0.87906 CBD 0.24379 iteration 15 RMSE 0.94087 MAE 0.7453492 new items: RMSE 1.08259 MAE 0.87939 CBD 0.24384 iteration 16 RMSE 0.9407812 MAE 0.7452323 new items: RMSE 1.08295 MAE 0.87968 CBD 0.24388 iteration 17 RMSE 0.9407096 MAE 0.7451338 new items: RMSE 1.08328 MAE 0.87994 CBD 0.24392 iteration 18 RMSE 0.9406515 MAE 0.745051 new items: RMSE 1.08357 MAE 0.88016 CBD 0.24395 iteration 19 RMSE 0.9406041 MAE 0.7449796 new items: RMSE 1.08383 MAE 0.88037 CBD 0.24398 iteration 20 RMSE 0.9405651 MAE 0.7449183 new items: RMSE 1.08406 MAE 0.88054 CBD 0.24401 iteration 21 RMSE 0.9405329 MAE 0.7448654 new items: RMSE 1.08427 MAE 0.8807 CBD 0.24404 iteration 22 RMSE 0.9405063 MAE 0.7448199 new items: RMSE 1.08446 MAE 0.88084 CBD 0.24406 iteration 23 RMSE 0.940484 MAE 0.7447811 new items: RMSE 1.08463 MAE 0.88097 CBD 0.24408 iteration 24 RMSE 0.9404655 MAE 0.7447473 new items: RMSE 1.08478 MAE 0.88108 CBD 0.24409 iteration 25 RMSE 0.9404498 MAE 0.7447181 new items: RMSE 1.08492 MAE 0.88118 CBD 0.24411 iteration 26 RMSE 0.9404366 MAE 0.7446926 new items: RMSE 1.08504 MAE 0.88127 CBD 0.24412 iteration 27 RMSE 0.9404254 MAE 0.7446704 new items: RMSE 1.08515 MAE 0.88135 CBD 0.24414 iteration 28 RMSE 0.9404159 MAE 0.744651 new items: RMSE 1.08525 MAE 0.88142 CBD 0.24415 iteration 29 RMSE 0.9404078 MAE 0.7446339 new items: RMSE 1.08533 MAE 0.88149 CBD 0.24416 iteration 30 RMSE 0.9404008 MAE 0.7446187 new items: RMSE 1.08541 MAE 0.88154 CBD 0.24417 iteration 31 RMSE 0.9403948 MAE 0.7446055 new items: RMSE 1.08548 MAE 0.88159 CBD 0.24417 iteration 32 RMSE 0.9403896 MAE 0.744594 new items: RMSE 1.08555 MAE 0.88164 CBD 0.24418 iteration 33 RMSE 0.9403852 MAE 0.7445838 new items: RMSE 1.08561 MAE 0.88168 CBD 0.24419 iteration 34 RMSE 0.9403813 MAE 0.7445748 new items: RMSE 1.08566 MAE 0.88172 CBD 0.24419 iteration 35 RMSE 0.940378 MAE 0.744567 new items: RMSE 1.0857 MAE 0.88175 CBD 0.2442 iteration 36 RMSE 0.940375 MAE 0.7445599 new items: RMSE 1.08574 MAE 0.88178 CBD 0.2442 iteration 37 RMSE 0.9403724 MAE 0.7445537 new items: RMSE 1.08578 MAE 0.88181 CBD 0.24421 iteration 38 RMSE 0.9403701 MAE 0.7445481 new items: RMSE 1.08582 MAE 0.88183 CBD 0.24421 iteration 39 RMSE 0.9403682 MAE 0.7445431 new items: RMSE 1.08585 MAE 0.88185 CBD 0.24422 iteration 40 RMSE 0.9403664 MAE 0.7445387 new items: RMSE 1.08587 MAE 0.88187 CBD 0.24422 iteration 41 RMSE 0.9403649 MAE 0.7445347 new items: RMSE 1.0859 MAE 0.88189 CBD 0.24422 iteration 42 RMSE 0.9403635 MAE 0.7445311 new items: RMSE 1.08592 MAE 0.88191 CBD 0.24422 iteration 43 RMSE 0.9403623 MAE 0.7445278 new items: RMSE 1.08594 MAE 0.88192 CBD 0.24423 iteration 44 RMSE 0.9403611 MAE 0.7445249 new items: RMSE 1.08596 MAE 0.88193 CBD 0.24423 iteration 45 RMSE 0.9403601 MAE 0.7445223 new items: RMSE 1.08597 MAE 0.88195 CBD 0.24423 iteration 46 RMSE 0.9403592 MAE 0.74452 new items: RMSE 1.08599 MAE 0.88196 CBD 0.24423 iteration 47 RMSE 0.9403585 MAE 0.744518 new items: RMSE 1.086 MAE 0.88196 CBD 0.24423 iteration 48 RMSE 0.9403577 MAE 0.7445161 new items: RMSE 1.08601 MAE 0.88197 CBD 0.24423 iteration 49 RMSE 0.9403571 MAE 0.7445144 new items: RMSE 1.08602 MAE 0.88198 CBD 0.24424 iteration 50

Would you like me to change it in the Pull Request so that you can analyze a possible merge?

Thank you, Marcelo

On Sat, Jan 26, 2013 at 12:21 PM, Marcelo Manzato mmanzato@icmc.usp.brwrote:

Hello Zeno,

Currently I am on vacation, but I will check as soon as I come back to work.

Regards!

Em sexta-feira, 25 de janeiro de 2013, Zeno Gantner escreveu:

Hi, what kind of results did you get with KorenImplicitKNN? I get an RMSE

of 1.13 for

bin/rating_prediction --training-file=u.data --test-ratio=0.5 --recommender=KorenImplicitKNN --recommender-options="num_factors=5" --data-dir=data/ml-100k --find-iter=1

which is not particularly good.

IntegratedSVDPlusPlusKNN seems to converge nicely.

Did you calibrate the methods against what was reported in the paper?

— Reply to this email directly or view it on GitHubhttps://github.com/zenogantner/MyMediaLite/pull/376#issuecomment-12718590.


Prof. Marcelo G. Manzato Computer Science Department (SCC) Mathematics and Computer Science Institute (ICMC) University of Sao Paulo (USP) Sao Carlos-SP Brazil

+55 16 3373 6638


Prof. Marcelo G. Manzato Computer Science Department (SCC) Mathematics and Computer Science Institute (ICMC) University of Sao Paulo (USP) Sao Carlos-SP Brazil

+55 16 3373 6601

zenogantner commented 11 years ago

Hi Marcelo,

thank you for the update.

Could you reproduce the results reported in the paper? The reason I ask is because simpler methods have better results than 0.94 on ml-100k.

mmanzato commented 11 years ago

Hi Zeno,

I could try, but I checked that Koren used the quiz set (from netflix) to test his algorithm. I do have the Netflix dataset which I downloaded from a non-official website, but it doesn't include the quiz set, which I can't find anywhere to download. Do you have it?

Alternatively, I could use only the training set, but in this case, I would need to split it into training and test sets.

What do you think?

Thanks! Marcelo

On Sat, Mar 16, 2013 at 4:44 PM, Zeno Gantner notifications@github.comwrote:

Hi Marcelo,

thank you for the update.

Could you reproduce the results reported in the paper?

— Reply to this email directly or view it on GitHubhttps://github.com/zenogantner/MyMediaLite/pull/376#issuecomment-15010977 .


Prof. Marcelo G. Manzato Computer Science Department (SCC) Mathematics and Computer Science Institute (ICMC) University of Sao Paulo (USP) Sao Carlos-SP Brazil

+55 16 3373 6601