irecsys / CARSKit

Java-Based Context-aware Recommendation Library
https://carskit.github.io/
GNU General Public License v3.0
124 stars 53 forks source link

Value of r #16

Closed zainab2014 closed 5 years ago

zainab2014 commented 5 years ago

Hi,

Please can you help with the following question :

If I am using the line (evaluation.setup=given-ratio -r 0.6), as I understood r represents the percentage of the training to the testing which in turn refers to the Matrix density so if r=0.6 then the Matrix density is 60%. Is my understanding correct?

I run the tool many times and changed r value to be 0.1, 0.4 0.6 then 0.8 because I am looking for the MAE when Matrix density is 10%, 40%, 60% and 80%. But I am surprised that the values of MAE and RMSE are increasing when density is increasing while they need to decrease as the training set will be more.

Not sure what I did wrong, my data file has the following format:

userid,itemid,rating,p1,p2,p3,p4 1,1,1,1,0.339,0,0.866 1,1,0.65,1,0.339,1,0.298 1,1,0.3,0.043,1,1,0.082

where p1,p2,p3,p4 are the four contexts I am using.

Your help is highly appreciated. Thanks in advance.

irecsys commented 5 years ago

Hello, I do not think the percentage refers to density. It is just the ratio of training set when you try to split the data to training and testing sets

zainab2014 commented 5 years ago

Thanks for your reply. I checked the Debug file many times: when r-0.8 the training number in the debug file shows that the number of records for training is approximately 80% of the original and the remaining is for testing. I am just a little confused because from my results the MAE goes bigger when the training set is bigger which is wrong.

Is the format for my data correct?

userid,itemid,rating,p1,p2,p3,p4 1,1,1,1,0.339,0,0.866 1,1,0.65,1,0.339,1,0.298 1,1,0.3,0.043,1,1,0.082

and I am using this config file dataset.ratings.wins=C:\Afterall.txt dataset.social.wins=-1

ratings.setup=-threshold 1 -datatransformation 1 recommender=CAMF_MCS evaluation.setup=given-ratio -r 0.4
item.ranking=off -topN 10 output.setup=-folder CARSKit.Workspace -verbose on, off –to-clipboard –to-file results.txt guava.cache.spec=maximumSize=200,expireAfterAccess=2m

Thanks for your support.

irecsys commented 5 years ago

OK. Similarity-based CAMF methods, including CAMF_ICS, CAMF_LCS, CAMF_MCS, can only be used for top-N recommendations. It is becuase I did use bounary as constraits, which results in that the predicted scores will not be in the original rating scale. Therefore, they can only be used for the purpose of top-N recommendations

zainab2014 commented 5 years ago

Sorry to bother you, but can you confirm the following: 1- is the sample of my data in the correct format? 2- is my config file correct? 3- I do not really understand your last reply in relating to CAMF_MCE currently it is working with the data I gave, the only concern I have is why MAE increase when r increase it should be the opposite. As when r bigger training set is bigger. I really need help to understand this. Thank you so much.

irecsys commented 5 years ago
  1. Your table above seems to be incorrect. The context variables must be nominal variable.
  2. Your file seems to be right, but as I mentioned, the CAMF_MCS cannot be used for rating predictions. You can evaluate it based on top-N recommendations by using metrics like precision, recall, NDCG
  3. It is because CAMF_MCS is not designed for rating predictions. You may not evaluate it based on MAE, RMSE, etc
irecsys commented 5 years ago

There is one option in the configuration file: item.ranking=on -topN 10 If you use the option "on" and set N as 10, you will get top-10 recommendation evaluations based on precision, recall, NDCG, etc

zainab2014 commented 5 years ago

Thanks for your time and reply. Actually, I saw that line in the CARSkit manual in regards to the ranking. The problem in my case I am looking for both prediction and recommendation so there is no way to use the CARSkit to do that? Also, I am not sure how can I change the four contexts I have p1-p4 although they are numbers the tool working smoothly and I did not get any error in relating to that. The problem in the MAE that I got not only with CAMF-MCS because I am trying with this tool for many months; so even the other algorithms give me the same issue. I am afraid that my understanding of the "r" ratio is not correct. Kindly look at the attached table below:

For r=0.1 I called it matrix density 10% may be wrong not sure? image

irecsys commented 5 years ago

First of all, the values in the context variable must be categorical values. You have numerical values in your data. The algorithms still run, but it may not make sense at all. It depends on what are these variables

Next, the CAMF_ICS, CAMF_LCS, CAMF_MCS and some other algorithms (such as BPR, CSLIM based methods) are only the algorithms for top-N recommendations. Therefore, you may not be able to use them to produce predicted ratings. i can see here you get lower MAE/RMSE by using CAMF_MCS -- it could be the results by chance!

Finally, of course, the parameter r simply refers to the ratio of the training set, not the density

On Thu, Oct 18, 2018 at 5:03 PM zainab2014 notifications@github.com wrote:

Thanks for your time and reply. Actually, I saw that line in the CARSkit manual in regards to the ranking. The problem in my case I am looking for both prediction and recommendation so there is no way to use the CARSkit to do that? Also, I am not sure how can I change the four contexts I have p1-p4 although they are numbers the tool working smoothly and I did not get any error in relating to that. The problem in the MAE that I got not only with CAMF-MCS because I am trying with this tools for many months; so even the other algorithms give me the same issue. I am afraid that my understanding of the "r" ratio is not correct. Kindly look at the attached table below:

[image: image] https://user-images.githubusercontent.com/8822132/47186860-eb3df600-d2ff-11e8-9537-77403af2e762.png

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/irecsys/CARSKit/issues/16#issuecomment-431179476, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDB51EeabveTFW7h7XDinMkThkT_DRfks5umPqngaJpZM4Xt0vr .