irecsys / CARSKit

Java-Based Context-aware Recommendation Library
https://carskit.github.io/
GNU General Public License v3.0
124 stars 53 forks source link

Performance of context aware algorithms #4

Closed neerajBaji closed 8 years ago

neerajBaji commented 8 years ago

Hi,

I am trying out CARSKit with the DePaul Movie dataset and I find that the contextual algos consistently perform worse than traditional collaborative filtering algorithms or even the average based algorithms.

I have not changed any of the algorithms specific default hyper parameters in settings.conf. The generated results are shown below. I have highlighted the ones that perform better than the rest.

Are these results expected? Is there any other dataset on which the contextual algorithms might perform better? Also, if there is a benchmarks page (ala LibRec) that I have missed, please do point me towards it.

As you can see, context unaware algorithms seem to be performing better. Please let me know if I have missed something here.

RESULTS: Final Results by SlopeOne, MAE: 0.967844, RMSE: 1.181897, NAME: 0.241961, rMAE: 0.946509, rRMSE: 1.211459, MPE: 0.000000, carskit.alg.baseline.cf.SlopeOne@56c86535, Time: '00:00','00:00' Final Results by ItemKNN, MAE: 0.868002, RMSE: 1.098535, NAME: 0.217000, rMAE: 0.837544, rRMSE: 1.130362, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00' Final Results by UserKNN, MAE: 0.916442, RMSE: 1.136917, NAME: 0.229111, rMAE: 0.892226, rRMSE: 1.171608, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00' Final Results by PMF, MAE: 2.329682, RMSE: 2.725427, NAME: 0.582421, rMAE: 2.329682, rRMSE: 2.725427, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00' Final Results by BPMF, MAE: 0.852885, RMSE: 1.086851, NAME: 0.213221, rMAE: 0.828192, rRMSE: 1.123668, MPE: 0.000000, 10, 120, Time: '00:04','00:00' Final Results by BiasedMF, MAE: 1.231312, RMSE: 1.423191, NAME: 0.307828, rMAE: 1.230463, rRMSE: 1.460495, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'

Final Results by NMF, MAE: 0.729386, RMSE: 0.994550, NAME: 0.182347, rMAE: 0.696364, rRMSE: 1.036728, MPE: 0.000000, 10, 120, Time: '00:00','00:00'

Final Results by SVD++, MAE: 1.237963, RMSE: 1.430500, NAME: 0.309491, rMAE: 1.248961, rRMSE: 1.479570, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00' Final Results by UserSplitting-BiasedMF, MAE: 1.230567, RMSE: 1.424304, NAME: 0.307642, rMAE: 1.235632, rRMSE: 1.471055, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00' Final Results by UserSplitting-ItemKNN, MAE: 0.858941, RMSE: 1.089768, NAME: 0.214735, rMAE: 0.839733, rRMSE: 1.130370, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00' Final Results by UserSplitting-UserKNN, MAE: 0.907304, RMSE: 1.136996, NAME: 0.226826, rMAE: 0.886262, rRMSE: 1.175547, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00' Final Results by UserSplitting-SlopeOne, MAE: 0.940398, RMSE: 1.166736, NAME: 0.235100, rMAE: 0.915888, rRMSE: 1.198899, MPE: 0.000000, carskit.alg.baseline.cf.SlopeOne@771a1d97, Time: '00:00','00:00' Final Results by UserSplitting-PMF, MAE: 2.329682, RMSE: 2.725427, NAME: 0.582421, rMAE: 2.329682, rRMSE: 2.725427, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00' Final Results by UserSplitting-BPMF, MAE: 0.869493, RMSE: 1.123035, NAME: 0.217373, rMAE: 0.839136, rRMSE: 1.152975, MPE: 0.000000, 10, 120, Time: '00:05','00:00' Final Results by UserSplitting-BiasedMF, MAE: 1.229375, RMSE: 1.419656, NAME: 0.307344, rMAE: 1.231460, rRMSE: 1.456512, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'

Final Results by UserSplitting-NMF, MAE: 0.769139, RMSE: 1.050730, NAME: 0.192285, rMAE: 0.743096, rRMSE: 1.094921, MPE: 0.000000, 10, 120, Time: '00:00','00:00'

Final Results by UserSplitting-SVD++, MAE: 1.233742, RMSE: 1.425546, NAME: 0.308436, rMAE: 1.234045, rRMSE: 1.466769, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00' Final Results by UserSplitting-UserAvg, MAE: 1.120046, RMSE: 1.330542, NAME: 0.280011, rMAE: 1.097438, rRMSE: 1.357503, MPE: 0.000000, carskit.alg.baseline.avg.UserAverage@78d5cfd6, Time: '00:00','00:00' Final Results by UserSplitting-ItemAvg, MAE: 1.090122, RMSE: 1.312299, NAME: 0.272530, rMAE: 1.073971, rRMSE: 1.345288, MPE: 0.000000, carskit.alg.baseline.avg.ItemAverage@1d402894, Time: '00:00','00:00'

Final Results by UserSplitting-UserItemAvg, MAE: 0.745786, RMSE: 1.071021, NAME: 0.186446, rMAE: 0.744489, rRMSE: 1.107227, MPE: 0.000000, carskit.alg.baseline.avg.UserItemAverage@5b7fd935, Time: '00:00','00:00'

Final Results by UserItemAvg, MAE: 0.689877, RMSE: 1.004177, NAME: 0.172469, rMAE: 0.668129, rRMSE: 1.036691, MPE: 0.000000, carskit.alg.baseline.avg.UserItemAverage@637719cf, Time: '00:00','00:00'

Final Results by ItemSplitting-UserItemAvg, MAE: 0.706875, RMSE: 1.022935, NAME: 0.176719, rMAE: 0.696765, rRMSE: 1.059279, MPE: 0.000000, carskit.alg.baseline.avg.UserItemAverage@644f96a0, Time: '00:00','00:00'

Final Results by ItemSplitting-ItemKNN, MAE: 0.860276, RMSE: 1.091466, NAME: 0.215069, rMAE: 0.833373, rRMSE: 1.130851, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00' Final Results by ItemSplitting-UserKNN, MAE: 0.914196, RMSE: 1.136213, NAME: 0.228549, rMAE: 0.887852, rRMSE: 1.172677, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00' Final Results by ItemSplitting-SlopeOne, MAE: 0.962192, RMSE: 1.176141, NAME: 0.240548, rMAE: 0.941140, rRMSE: 1.210475, MPE: 0.000000, carskit.alg.baseline.cf.SlopeOne@16d8db20, Time: '00:00','00:00'

Final Results by ItemSplitting-NMF, MAE: 0.764128, RMSE: 1.040546, NAME: 0.191032, rMAE: 0.735947, rRMSE: 1.081225, MPE: 0.000000, 10, 120, Time: '00:00','00:00'

Final Results by ItemSplitting-ItemAvg, MAE: 1.101381, RMSE: 1.305931, NAME: 0.275345, rMAE: 1.083516, rRMSE: 1.337027, MPE: 0.000000, carskit.alg.baseline.avg.ItemAverage@78d5cfd6, Time: '00:00','00:00' Final Results by UISplitting-UserItemAvg, MAE: 0.764234, RMSE: 1.091865, NAME: 0.191058, rMAE: 0.765368, rRMSE: 1.128514, MPE: 0.000000, carskit.alg.baseline.avg.UserItemAverage@3315a56d, Time: '00:00','00:00'

CONTEXT AWARE: Final Results by ContextAvg, MAE: 1.210466, RMSE: 1.405670, NAME: 0.302616, rMAE: 1.176579, rRMSE: 1.440965, MPE: 0.000000, carskit.alg.baseline.avg.ContextAverage@13065590, Time: '00:00','00:00' Final Results by ContextAvg, MAE: 1.210466, RMSE: 1.405670, NAME: 0.302616, rMAE: 1.176579, rRMSE: 1.440965, MPE: 0.000000, carskit.alg.baseline.avg.ContextAverage@61877c15, Time: '00:00','00:00' Final Results by ItemContextAvg, MAE: 1.088244, RMSE: 1.313791, NAME: 0.272061, rMAE: 1.058464, rRMSE: 1.340398, MPE: 0.000000, carskit.alg.baseline.avg.ItemContextAverage@5c877f84, Time: '00:00','00:00' Final Results by UserContextAvg, MAE: 1.027653, RMSE: 1.248563, NAME: 0.256913, rMAE: 1.013124, rRMSE: 1.294667, MPE: 0.000000, carskit.alg.baseline.avg.UserContextAverage@2f178e05, Time: '00:00','00:00' Final Results by CPTF, MAE: 2.329682, RMSE: 2.725427, NAME: 0.582421, rMAE: 2.329682, rRMSE: 2.725427, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00' Final Results by CAMF_CI, MAE: 1.549310, RMSE: 2.006931, NAME: 0.387328, rMAE: 1.562343, rRMSE: 2.050521, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00' Final Results by CAMF_CU, MAE: 1.539712, RMSE: 2.004897, NAME: 0.384928, rMAE: 1.550807, rRMSE: 2.045866, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00' Final Results by CAMF_C, MAE: 1.227395, RMSE: 1.438526, NAME: 0.306849, rMAE: 1.225086, rRMSE: 1.498170, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00' Final Results by CAMF_CUCI, MAE: 1.231175, RMSE: 1.429833, NAME: 0.307794, rMAE: 1.229866, rRMSE: 1.476612, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:04','00:00' Final Results by CAMF_ICS, MAE: 2.329682, RMSE: 2.725427, NAME: 0.582421, rMAE: 2.329682, rRMSE: 2.725427, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:02','00:00' Final Results by CAMF_LCS, MAE: 2.300570, RMSE: 2.697210, NAME: 0.575143, rMAE: 2.302239, rRMSE: 2.700686, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:03','00:00' Final Results by CAMF_MCS, MAE: 2.329682, RMSE: 2.725427, NAME: 0.582421, rMAE: 2.329682, rRMSE: 2.725427, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:03','00:00'

irecsys commented 8 years ago

Hello, Thanks for your interests in our toolkit.

Actually, it is not surprising for me to see these results. Let's me explain the results from the perspective of FACT and EXPERIMENT:

FACT: 1). Whether context is able to improve recommendation, it is a domain-specific and data-specific problem. Therefore, it is not necessary to say that context-aware algorithms always outperform non-contextual algorithms; 2). Context selection is important in context-aware recsys. Including irrelevant context informtion in the data will introduce noises which impact the performance of context-aware recsys. 3). Context-aware data set is usually small and sparse, which leads to unreliable results in prediction errors. Usually we trust results in top-N recommendation instead of prediction errors. 3). Similarity-based context-aware recsys, such as CAMF_ICS, LCS, MCS, we view them as algorithms for top-N recommendation task. In addition, the splitting approaches is a kind of context-aware algorithms. 4). Some algorithms (e.g., CPTF and FM) in our library did not well treat the outbound problems in the learning process which leads to bad results in prediction errors, but still good results in top-N recommendation.

EXPERIMENT: 1). As mentioned before, you should try to select the most relevant and influential context dimensions in the preprocessing stage 2). You should tune up the paramters, especially for the learning-based algorithms 3). We do not trust error-based metrics on small data sets. Take the average-based algorithms for example, it may outperform a complex algorithm when data is very sparsy. Use may just rate an item for limited times, e.g., twice, then the average rating of these becomes the best choice. Apparently, it is unreliable to trust an average rating based on just limited number of rating profiles. So, that's the general problem in the context-aware recommendation. 4). In terms of the top-N recommendation, the evaluation on context-aware recsys is different from traditional ones, since we are going to recommend a list of items to <user, item, context>. For a small data set, user may just rate limited number of items in a sepcific context situation, so the results on top-N recommendation is usually very small compared with traditional recsys

Anyway, in short, your results are not surprising. And usually we do not trust the error-based metrics in this domain. We prefer top-N recommendations. In addition, you'd better try a larger data set and perform context selection before using it.

neerajBaji commented 8 years ago

Hi,

Thanks a lot for the detailed answer. I was wondering about the performance of UserItemAvg as well. I will redo my experiments with ranking metrics and report my observations.

I have gone through the page you maintain for contextual datasets (thanks for that!); based on your experience which dataset do you believe would clearly showcase the improved performance of context aware recsys? I understand there would be some amount of data cleaning involved but if possible I would like to do that with a dataset that is more likely to show contextual differentiation.

irecsys commented 8 years ago

This is a list of results I ran for the DePaulMovie data. I did not run with finer-grained parameters tunes, but only adjust the learning-rate for matrix-factorization based algorithms. And the evaluation is based on prediction errors. And we focus on the RMSE metric.

In my evaluation, the best non-contextual recommender is BiasedMF which obtains 0.964 on RMSE; the best context-aware algorithm is CAMF_CU which obtains a 0.884 result on RMSE.

Final Results by GlobalAvg, MAE: 1.229522, RMSE: 1.414676, NAME: 0.307381, rMAE: 1.237230, rRMSE: 1.452487, MPE: 0.000000, carskit.alg.baseline.avg.GlobalAverage@5fdef03a, Time: '00:00','00:00' Final Results by UserAvg, MAE: 1.109387, RMSE: 1.316039, NAME: 0.277347, rMAE: 1.092272, rRMSE: 1.349704, MPE: 0.000000, carskit.alg.baseline.avg.UserAverage@5fdef03a, Time: '00:00','00:00' Final Results by ItemAvg, MAE: 1.089705, RMSE: 1.300056, NAME: 0.272426, rMAE: 1.072582, rRMSE: 1.334432, MPE: 0.000000, carskit.alg.baseline.avg.ItemAverage@5fdef03a, Time: '00:00','00:00' Final Results by UserKNN, MAE: 0.950417, RMSE: 1.170172, NAME: 0.237604, rMAE: 0.924645, rRMSE: 1.201833, MPE: 0.000000, 10, COS, -1, Time: '00:00','00:00' Final Results by ItemKNN, MAE: 0.967005, RMSE: 1.195632, NAME: 0.241751, rMAE: 0.951093, rRMSE: 1.233333, MPE: 0.000000, 10, COS, -1, Time: '00:00','00:00' Final Results by BiasedMF, MAE: 0.666487, RMSE: 0.964239, NAME: 0.166622, rMAE: 0.643873, rRMSE: 0.998125, MPE: 0.000000, numFactors: 20, numIter: 100, lrate: 0.002, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00' Final Results by ItemSplitting-BiasedMF, MAE: 0.759239, RMSE: 0.978460, NAME: 0.189810, rMAE: 0.726789, rRMSE: 1.020789, MPE: 0.000000, numFactors: 10, numIter: 100, lrate: 2.0E-4, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00' Final Results by UISplitting-BiasedMF, MAE: 0.767762, RMSE: 0.991922, NAME: 0.191940, rMAE: 0.734541, rRMSE: 1.030315, MPE: 0.000000, numFactors: 10, numIter: 100, lrate: 2.0E-4, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00' Final Results by CAMF_C, MAE: 0.683617, RMSE: 0.929397, NAME: 0.170904, rMAE: 0.644069, rRMSE: 0.970799, MPE: 0.000000, numFactors: 10, numIter: 100, lrate: 0.002, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00' Final Results by CAMF_CU, MAE: 0.637444, RMSE: 0.884012, NAME: 0.159361, rMAE: 0.596739, rRMSE: 0.918705, MPE: 0.000000, numFactors: 10, numIter: 100, lrate: 0.002, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00' Final Results by CAMF_CI, MAE: 0.694390, RMSE: 0.930310, NAME: 0.173598, rMAE: 0.663152, rRMSE: 0.974802, MPE: 0.000000, numFactors: 10, numIter: 100, lrate: 2.0E-4, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:02','00:00'

irecsys commented 8 years ago

Hello, in my experience, you can find significant contextual effects on the most context-aware data sets listed in my repository, if you evaluate them by precison,recall,NDCG, MRR, MAP, etc.

neerajBaji commented 8 years ago

Thanks for the numbers.