Splitting Approaches - Githubissues

MatthiasKirsch commented 6 years ago

Hi,

I'm trying to use BPR with UISplitting. Everytime I execute the settings.conf the log tells me that there haven been 0 items/users splitted. Is this normal?

My procedure: I created two settings.conf files. One is for transforming the testset into binary format and the other one is for the real process with my trainset as trainset and the testset (transformed to binary format) as testset. I also tried to directly use my testset as non-binary format csv-file together with the trainset but this gives me an error, so I think it might be fine to convert the testset in a first step and then use the output as new testset.

1. create binary format from testset Code snippet:
```
dataset.ratings.lins=/home/[...]/test_carskit_Bundesland.csv
dataset.social.lins=-1
ratings.setup=-threshold -1 -datatransformation 1 -fullstat -1
[...]
evaluation.setup=test-set -f /home/[...]/train_carskit_Bundesland.csv
```
  After this step I extract the converted testset (now: ratings_binary.txt) from the created CARSKit.Workspace folder and put it next to my trainset. Then I deleted the CARSKit.Workspace folder and the debug.log and results.txt file.
1. run normal approach: Code snippet:
```
dataset.ratings.lins=/home/[...]/train_carskit_Bundesland.csv
dataset.social.lins=-1
ratings.setup=-threshold -1 -datatransformation 1 -fullstat -1
recommender= uisplitting -traditional bpr
evaluation.setup=test-set -f /home/[...]/ratings_binary.txt
```
  When I now run this, than it first starts converting the trainset into binary format which is fine. After doing so it starts the UISplit and it tells me 0 items have been splitted and 0 users have been splitted. I don't know if this is okay because the process continues with bpr aftwerwards and doesn't give me an error. But this has finished and I evaluate my results and compare the context splitted results with the one only using BPR the curves seem to be very similar. So I thought it might be that I am doing something wrong here.

Can you help me please :) Thank you very much!

This is my output:

/**********************************************************************************************************
 *
 * Dataset: /home/[...]/CARSKit.Workspace/ratings_binary.txt
 * 
 * Statistics of U-I-C Matrix:
 * User amount: 508769
 * Item amount: 93689
 * Rate amount: 4118854
 * Context dimensions: 1 (bundesland)
 * Context conditions: 12 (bundesland: 12)
 * Context situations: 11
 * Data density: 0.0007%
 * Scale distribution: [1.0 x 4118854]
 * Average value of all ratings: 1.000000
 * Standard deviation of all ratings: 0.000000
 * Mode of all rating values: 1.000000
 * Median of all rating values: 1.000000
 *
 **********************************************************************************************************/
With Setup: test-set -f /home/jannis/Fileshares/matti/CROSS_VALIDATION/1/Bundesland/ratings_binary.txt
Dataset: ...ION/1/Bundesland/ratings_binary.txt
DataPath: /home/jannis/Fileshares/matti/CROSS_VALIDATION/1/Bundesland/ratings_binary.txt
Rating data set has been successfully loaded.
0 items have been splitted.
0 users have been splitted.
UI Splitting is done... Algorithm 'bpr' will be applied to the transformed data set.
Density of transformed 2D rating matrix ============================== 0.0075777913744593155
Final Results by UISplitting-BPR, Pre1: 0.012440,Pre2: 0.010356,Pre3: 0.008427,Pre4: 0.007442,Pre5: 0.006712,Pre6: 0.006134,Pre7: 0.005674,Pre8: 0.005283,Pre9: 0.004979,Pre10: 0.004729,Pre11: 0.004500,Pre12: 0.004297,Pre13: 0.004124,Pre14: 0.003963,Pre15: 0.003813,Pre16: 0.003681,Pre17: 0.003572,Pre18: 0.003469,Pre19: 0.003379,Pre20: 0.003295, Rec1: 0.005252,Rec2: 0.008719,Rec3: 0.010527,Rec4: 0.012352,Rec5: 0.013880,Rec6: 0.015140,Rec7: 0.016400,Rec8: 0.017554,Rec9: 0.018630, Rec10: 0.019651, Rec11: 0.020667, Rec12: 0.021555, Rec13: 0.022390, Rec14: 0.023233, Rec15: 0.023917, Rec16: 0.024644, Rec17: 0.025447, Rec18: 0.026170, Rec19: 0.026991, Rec20: 0.027826, AUC: 0.531280, MAP: 0.009775, NDCG: 0.017080, MRR: 0.022582, -1.0,10,0.02,-1.0,1.0E-4,1.0E-4,100, Time: '02:11:02','01:02:48'

MatthiasKirsch commented 6 years ago

I created a little dataset with the following data an tried to use itemsplitting. It does not work, maybe the code is broken?

grafik

Here the items 1, 2 and 6 are rated in different contexts so the ItemSplitting should divide them but it doesn't. Still "0 items splitted".

irecsys commented 6 years ago

What's ur rating, binary one?

On Fri, Oct 20, 2017 at 1:57 AM MatthiasKirsch notifications@github.com wrote:

I created a little dataset with the following data an tried to use itemsplitting. It does not work, maybe the code is broken?

[image: grafik] https://user-images.githubusercontent.com/18574614/31808666-7e01b49a-b574-11e7-8813-930743e08de0.png

Here the items 1, 2 and 6 are rated in different contexts so the ItemSplitting should divide them but it doesn't. Still "0 items splitted".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/irecsys/CARSKit/issues/11#issuecomment-338122251, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDB5whauMNaCqQD2jntP5tX4F79YP8-ks5suERCgaJpZM4P9VJ5 .

-- Sent from Gmail Mobile

MatthiasKirsch commented 6 years ago

My rating has only positive values 1 because it is purchase data (no zeros available). This is why I use BPR because I think this algorithm can handle this kind of data. The BPR works fine but the ItemSplitting in front does not.

irecsys commented 6 years ago

Splitting doesn't work if u only have 1 as rarings in ur data

On Fri, Oct 20, 2017 at 6:36 AM MatthiasKirsch notifications@github.com wrote:

Yes, my rating has only positive values 1 because it is purchase data. This is why I use BPR because I think this algorithm can handle this kind of data. The BPR works fine but the ItemSplitting in front does not.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/irecsys/CARSKit/issues/11#issuecomment-338182115, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDB5-P6jXrkCySvCu2go3IJSJoAokHzks5suIW-gaJpZM4P9VJ5 .

-- Sent from Gmail Mobile

hjanh commented 6 years ago

Is it possible to change the implementation of Itemsplitting to make it work with positive-only data? For example, using chi2 or entropy instead of a t-test on rating deviation as splitting criteria? Can the BPR-algorithm then also work properly without negative rating? If not, what would you recommend as best-pratice for working with positive-only data? Creating dummy zeros for the very large number of unseen products? I'm concerned that this is impossible with large datasets (e.g. 10.000s of products and 100.000s of users). Thanks for any advice

irecsys commented 6 years ago

You can download source codes and modify it by yourself. Or you can assignmen some zero ratings.

On Fri, Oct 20, 2017 at 2:47 PM, hankej notifications@github.com wrote:

Is it possible to change the implementation of Itemsplitting to make it work with positive-only data? For example, using chi2 instead of a t-test on rating deviation? If not what is the best-pratice for working with positive-only data? Creating dummy zeros for the very large number of unseen products?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/irecsys/CARSKit/issues/11#issuecomment-338305352, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDB5za66zWMQL1hM9410z_B7ofebCfXks5suPjhgaJpZM4P9VJ5 .

hjanh commented 6 years ago

Thanks for your reply. I don't get the point of "inventing" some zero ratings. Wouldn't this distort the results if i start assigning zero values randomly? And if i would assign a "0" to each possible user-item combination, the rating file will not fit into memory.

irecsys commented 6 years ago

In this case, it is suggested to get external/other feedback information, such as implicit feedbacks. So that you can distinguish positive and negative ratings.

To utilize the splitting based approaches, you should have positive and negative feedbacks, so that they can perform the 'split'

On Fri, Oct 20, 2017 at 5:21 PM, Jh notifications@github.com wrote:

Thanks for your reply. I don't get the point of "inventing" some zero ratings. Wouldn't this distort the results if i start assigning zero values randomly? And if i would assign a "0" to each possible user-item combination, the rating file will not fit into memory.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/irecsys/CARSKit/issues/11#issuecomment-338335847, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDB54YOGfUl42tIvADbbNCKtxefOutpks5suRzkgaJpZM4P9VJ5 .

zeyboukli commented 6 years ago

please , i interest to splitting approach; in result.txt i find just : Final Results by UserSplitting-BiasedMF, Pre5: 0,086689,Pre10: 0,056011, Rec5: 0,221093, Rec10: 0,259981, AUC: 0,661162, MAP: 0,161856, NDCG: 0,214975, MRR: 0,272366, numFactors: 10, numIter: 100, lrate: 0.02, maxlrate: -1.0, regB: 1.0E-4, regU: 1.0E-4, regI: 1.0E-4, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00' but i would like display the evaluation of this approach by displaying root mean square error (RMSE) Can you help me please Thank you

irecsys commented 5 years ago

please , i interest to splitting approach; in result.txt i find just : Final Results by UserSplitting-BiasedMF, Pre5: 0,086689,Pre10: 0,056011, Rec5: 0,221093, Rec10: 0,259981, AUC: 0,661162, MAP: 0,161856, NDCG: 0,214975, MRR: 0,272366, numFactors: 10, numIter: 100, lrate: 0.02, maxlrate: -1.0, regB: 1.0E-4, regU: 1.0E-4, regI: 1.0E-4, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00' but i would like display the evaluation of this approach by displaying root mean square error (RMSE) Can you help me please Thank you

In the configuration file, there is one option: item.ranking=on -topN 10 If you set it as "off", you will get results of the rating predictions

irecsys / CARSKit

Splitting Approaches #11