Open 23tux opened 11 years ago
Looking into it.
Have you already had time to take a look at it?
So, I ran your example and got the same result. "NaN" is returned by Mahout itself, which means that it wasn't able to calculate a preference estimate based on your input (it could be because the sample size is too small, or all of your values are 5.0, or the combination of the recommender, similarity metric and neighborhood couldn't come up with anything). My hunch is that it's because of the dataset and similarity metric that was chosen.
I tried creating a Slope One recommender and it was able to generate an estimate for me:
recommender = JrubyMahout::Recommender.new(nil, nil, "SlopeOneRecommender", false)
resulted in: 5.0
.
I think the same applies to recommend
.
As for the last issue, can you provide a small sample of the data from the movie lens dataset? A couple of rows should be sufficient. I want to make sure you are formatting in properly before I am going to look into the exception issue.
Thanks for your answer! Sorry for the delay, my masterthesis is keeping me busy. I tried it out with the movielens dataset, 100,000 ratings. I splitted it into 50 rows for testset, and the rest for the training. I get now some estimations, but with a coverage of only 68%. And I also get the exception mentioned above. I think out of a dataset with 100,000 ratings, the pearson correlation should be able to produce more proper recommendations. Or am I wrong?
I have the following code, in which I pasted the testset. The trainingsset (without the rows of the testset) can be downloaded here: http://sketchit.de/movielens_without_testset.csv
require 'rubygems'
require 'ruby-debug'
require 'jruby_mahout'
require 'csv'
csv = "196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
253 465 5 891628467
305 451 3 886324817
6 86 3 883603013
62 257 2 879372434
286 1014 5 879781125
200 222 5 876042340
210 40 3 891035994
224 29 3 888104457
303 785 3 879485318
122 387 5 879270459
194 274 2 879539794
291 1042 4 874834944
234 1184 2 892079237
119 392 4 886176814
167 486 4 892738452
299 144 4 877881320
291 118 2 874833878
308 1 4 887736532
95 546 2 879196566
38 95 5 892430094
102 768 2 883748450
63 277 4 875747401
160 234 5 876861185
50 246 3 877052329
301 98 4 882075827
225 193 4 879539727
290 88 4 880731963
97 194 3 884238860
157 274 4 886890835
181 1081 1 878962623
278 603 5 891295330
276 796 1 874791932
7 32 4 891350932
10 16 4 877888877
284 304 4 885329322
201 979 2 884114233
276 564 3 874791805
287 327 5 875333916
246 201 5 884921594
242 1137 5 879741196
249 241 5 879641194
99 4 5 886519097
178 332 3 882823437
"
@arr = CSV.parse(csv, col_sep: "\t")
def rec neighborhood_size, is_weighted
puts "neighborhood: #{neighborhood_size}, is_weighted: #{is_weighted}"
recommender = JrubyMahout::Recommender.new("PearsonCorrelationSimilarity", neighborhood_size, "GenericUserBasedRecommender", is_weighted)
recommender.data_model = JrubyMahout::DataModel.new("file", { :file_path => "movielens_without_testset.csv" }).data_model
fallout = 0
@arr.each do |a|
user = a[1].to_i
item = a[0].to_i
begin
r = recommender.estimate_preference(user,item)
fallout += 1 if r.nan?
rescue Exception => e
puts ""
end
end
puts "Tuples: #{@arr.count}"
puts "Fallout #{fallout} -> #{fallout/@arr.count.to_f*100.round(3)}%"
puts "-----------------"
end
rec 5, false
I tried it with different neighborhood sizes, but it only varies about 5%. This is the output that is produced:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/23tux/projects/mahout/mahout-distribution-0.7/lib/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/23tux/projects/mahout/mahout-distribution-0.7/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
missing class or uppercase package name (`org.postgresql.ds.PGPoolingDataSource')
log4j:WARN No appenders could be found for logger (org.apache.mahout.cf.taste.impl.model.file.FileDataModel).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception: 1014
Exception: 1042
Exception: 1184
Exception: 1081
Exception: 979
Exception: 1137
Tuples: 50
Fallout 34 -> 68.0%
Could it be something with this warnings that I get? Hope we can fix this problem ;)
Hi,
I'm not sure, if this is a bug or I'm just doing wrong. I tried out your example, and played a little bit around. By the way, really cool work ;)
But the problem is, that I don't get any recommendations from the engine. I'm using Mahout 0.7 with JRuby 1.7.3. I have the following script:
The
test.csv
is fairly simple and looks like this (user 1 haven't rated item 1):When I try to
puts recommender.estimate_preference(1,1)
I always getNaN
which means that the recommender isn't able to generate a rating for that user-item tuple. But my neighborhood size is only 2, and there are only items which match "perfectly" to users' 1 profile. What I'm doing wrong? Do I have to calculate the similarities on my own?Further, calling
recommender.recommend(1, 1, nil)
to get a list of 1 item for user 1 returns an empty array[]
.I also tried it with the movie lens dataset by splitting it 80% training and 20% testset, same results. And here the recommender throws an exception at the
recommender.estimate_preference(user,item)
point:Hope you can help me, I would love to use your project for my master thesis experiments ;) (and of course, cite your work)