shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.03k stars 1.04k forks source link

Examples of simple recommender system #2068

Open emtiyaz opened 10 years ago

emtiyaz commented 10 years ago

The goal of this task will be to create very simple GP based recommender systems. The task consists of the following steps. (1) Write an example to read the data and create test and train set, for example, on Movielens data (see #1982) (2) Use GP regression (from Shogun) to predict each user's ratings. See sample code in Matlab here. https://github.com/emtiyaz/recommendation-system-examples (3) Use GP classification to do the same.

The goal is to get you familiar with recommendation systems and also with Shogun's GP functions.

Feel free to ask any questions to me or @karlnapf.

emtiyaz commented 10 years ago

@k29 @yorkerlin @pl8787 @ouceduxzk , have a look if you want to try this

pl8787 commented 10 years ago

@emtiyaz You mean that we just use the GP already existed in Shogun. Both use regression and classification to train and test the data.

pl8787 commented 10 years ago

@karlnapf @emtiyaz MovieLens data has string value like Movie Name, can I use other program like python pre-process the data? Or there are some useful io class i don't know?

vigsterkr commented 10 years ago

@pl8787 well if there's a common format for recommendation then it would be good to have actually an IO class that can read such format.

emtiyaz commented 10 years ago

@pl8787 Yes, please use the GP code that already exist in Shogun.

In my experience, every dataset comes in a very different form, however it does help to have some common IO class that perform tasks, e.g. similar to importdata in Matlab and readtable in R.

Having said that, since there is not much time right now, it is ok to process data outside to save time for now (may be using my Matlab function).

karlnapf commented 10 years ago

For reference data-sets the best thing would be to put them into a format that we can read from Shogun classes by hand and add them to the data-repository.

karlnapf commented 10 years ago

And users can then pre-process their individual datasets on their own using their favourite langauge

pl8787 commented 10 years ago

@emtiyaz @karlnapf There's some problems when I implement this example. Dataset: Movielens 100k Generate train feature matrix: 21_8000 Generate test feature matrix: 21_2000 when I call function

CExactInferenceMethod* inf = new CExactInferenceMethod(kernel,
                          feat_train, mean, lab_train, lik);
...
CGaussianProcessRegression* gpr = new CGaussianProcessRegression(inf);
...
CRegressionLabels* predictions=gpr->apply_regression(feat_test);

the error occur:

Out of memory error, tried to allocate 12800000000 bytes using malloc.
emtiyaz commented 10 years ago

Hint: There is a for loop in my Matlab code for a reason.

karlnapf commented 10 years ago

Thats 12 GB. You could also try to work in float32. What happens if you loop over the test examples without holding them in memory? I guess thats what you meant @emtiyaz ?

emtiyaz commented 10 years ago

Well, what I was hinting is that @pl8787 need to understand the model right. The model is GP regression for each user, independently. Also, if my Matlab code can run within 5 sec without any memory issue, Shogun should also do the same job. I hope you can understand the model and code it.

emtiyaz commented 10 years ago

@pl8787 I hope you can run my Matlab code right. I realize there were some dependencies missing, but they are there now. Let me know if you have trouble understanding the model, although given your experience with recommender system, I think you will figure it out yourself!

pl8787 commented 10 years ago

@emtiyaz Thanks a lot, I have realized that, actually I am still working on it. @karlnapf Yes I know it's too large for a normal PC.