Evaluation using test-set

h4y4h0o commented 6 years ago

Hi,

I recently use CARSkit to compare some context-aware recommendation algorithms of the state of the art. I would like to evaluate them by supplying manually the training and testing set.

It creates the binary file, but I got the error "value already present: 0" I checked and I don't have duplicate lines present in both train and test files. What could be the problem?

Here is my config file:

dataset.ratings.wins=C:\train.csv dataset.social.wins=-1 dataset.social.lins=-1 ratings.setup=-threshold 3 -datatransformation -1 recommender=camf_ci evaluation.setup=test-set -f C:\testFile_0.csv item.ranking=off -topN 10 output.setup=-folder CARSKit.Workspace -verbose on, off --to-file results.txt guava.cache.spec=maximumSize=200,expireAfterAccess=2m ########## Model-based Methods ########## num.factors=10 num.max.iter=100 learn.rate=2e-2 -max -1 -bold-driver reg.lambda=0.0001 -c 0.001 pgm.setup=-alpha 2 -beta 0.5 -burn-in 300 -sample-lag 10 -interval 100 similarity=pcc num.shrinkage=-1 num.neighbors=10

The error output: java.lang.IllegalArgumentException: value already present: 0 at com.google.common.collect.HashBiMap.put(HashBiMap.java:238) at com.google.common.collect.HashBiMap.put(HashBiMap.java:215) at carskit.data.processor.DataDAO.readData(DataDAO.java:208) at carskit.main.CARSKit.runAlgorithm(CARSKit.java:319) at carskit.main.CARSKit.execute(CARSKit.java:121) at carskit.main.CARSKit.main(CARSKit.java:93)

Thanks in advance for your help.

MatthiasKirsch commented 6 years ago

Hi @h4y4h0o,

I had the same problem, when I tried to input my own train and test files. The trick for me was first to run the data transformation with your test.csv and after that put the created ratings_binary.txt file as test data.

Try the follwoing: Edit your settings.conf so that your test.csv is where you have your train.csv at the moment. Run CARSKIT and after the programm has finished datatransformation it will throw you an error. You can ignore this error because you only want to have the ratings_binary.txt file created in your output folder.

dataset.ratings.wins=C:\test.csv

Next you extract the ratings_binary.txt file from the output folder and place it somewhere else, for example next to your test.csv file. After doing this you can change your settings.conf again:

dataset.ratings.wins=C:\train.csv
[...]
evaluation.setup=test-set -f C:\ratings_binary.txt

So this should do the trick. Let me know if that works for you :-)

irecsys commented 6 years ago

Thanks! The whole idea behind is that, you must make sure your training and testing data have the same format. Either you should prepare them by yourself, or you need to use the internal transformer to convert the data to the correct format.

On Tue, Oct 31, 2017 at 9:42 AM, MatthiasKirsch notifications@github.com wrote:

Hi @h4y4h0o https://github.com/h4y4h0o,

I had the same problem, when I tried to input my own train and test files. The trick for me was first to run the data transformation with your test.csv and after that put the created ratings_binary.txt file as test data.

Try the follwoing: Edit your settings.conf so that your test.csv is where you have your train.csv at the moment. Run CARSKIT and after the programm has finished datatransformation it will throw you an error. You can ignore this error because you only want to have the ratings_binary.txt file created in your output folder.

dataset.ratings.wins=C:\test.csv

Next you extract the ratings_binary.txt file from the output folder and place it somewhere else, for example next to your test.csv file. After doing this you can change your settings.conf again:

dataset.ratings.wins=C:\train.csv [...] evaluation.setup=test-set -f C:\ratings_binary.txt

So this should do the trick. Let me know if that works for you :-)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/irecsys/CARSKit/issues/12#issuecomment-340783915, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDB55fi5wIk7YY3y6ltemJot8D3o0e-ks5sxzHGgaJpZM4QMuUt .

h4y4h0o commented 6 years ago

Thank you for your responses. It works with your trick @MatthiasKirsch :)

h4y4h0o commented 6 years ago

I have another question about the evaluation: I run multiple times an algorithm, in the same training and testing set, but the evaluation results (MAE, RMSE, etc) are different each time! I don't understand what does happen?

irecsys commented 6 years ago

Some algorithms are sensitive to initializations -- that's the readon why

On Tue, Nov 7, 2017 at 10:46 PM h4y4h0o notifications@github.com wrote:

I have another question about the evaluation: I run multiple times an algorithm, in the same training and testing set, but the evaluation results (MAE, RMSE, etc) are different each time! I don't understand what does happen?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/irecsys/CARSKit/issues/12#issuecomment-342504170, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDB5xhRU9mmtca3rfq29QMjMAOUJEo3ks5s0G00gaJpZM4QMuUt .

-- Sent from Gmail Mobile

irecsys commented 5 years ago

OK. This issue was reported by several users. And I had this issue recently too. I am going to fix it by rebuilding the process of transformations. The new library may be relesed very soon.

irecsys / CARSKit

Evaluation using test-set #12