mhahsler / recommenderlab

recommenderlab - Lab for Developing and Testing Recommender Algorithms - R package
213 stars 61 forks source link

Recommenderlab with Predict error #58

Closed shuhcl closed 1 year ago

shuhcl commented 2 years ago

Hello Michael, Thank you for this amazing package! There seems to be a bug about sparse matrix multiplication within the predict() function. I'm working on the below code, which keeps reporting an error on the predict() function. any ideas about how to deal with it?

library(recommenderlab)

# simulate matrix with 1000 users and 100 movies
m <- matrix(nrow = 2000, ncol = 100)

# simulated ratings (5% of the data)
m[sample.int(100 * 2000, 10000)] <- ceiling(runif(1000, 0, 5))

# convert into a realRatingMatrix
r <- as(m, "realRatingMatrix")

# UBCF recommender
UB.Rec <- Recommender(r, method = "UBCF")

pred <- predict(UB.Rec, r, type = "ratings")

as(pred, "matrix")

sessioninfo

R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8 
[2] LC_CTYPE=Chinese (Simplified)_China.utf8   
[3] LC_MONETARY=Chinese (Simplified)_China.utf8
[4] LC_NUMERIC=C                               
[5] LC_TIME=Chinese (Simplified)_China.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] recommenderlab_1.0.2 registry_0.5-1       proxy_0.4-27         arules_1.7-5        
[5] Matrix_1.5-1        

loaded via a namespace (and not attached):
 [1] compiler_4.2.1     generics_0.1.3     recosystem_0.5     tools_4.2.1       
 [5] float_0.3-0        Rcpp_1.0.9         grid_4.2.1         irlba_2.3.5.1     
 [9] matrixStats_0.62.0 lattice_0.20-45   
mhahsler commented 2 years ago

Hi, it looks like your code creates some users with 0 ratings. I will add a better error message to recommenderlab. To fix the problem, you need to remove users without any ratings:

> # simulate matrix with 1000 users and 100 movies
> m <- matrix(nrow = 2000, ncol = 100)
> 
> # simulated ratings (5% of the data)
> m[sample.int(100 * 2000, 10000)] <- ceiling(runif(1000, 0, 5))
> 
> # remove users with no ratings
> (noratings <- which(rowSums(!is.na(m)) == 0))
 [1]   88  233  381  603  641  747 1063 1165 1433 1437 1688
> m <- m[-noratings, ]
> 
> # convert into a realRatingMatrix
> r <- as(m, "realRatingMatrix")
> 
> # UBCF recommender
> UB.Rec <- Recommender(r, method = "UBCF")
> 
> pred <- predict(UB.Rec, r, type = "ratings")
shuhcl commented 2 years ago

Hi Michael, Thank for your suggestion. Given a large sparse matrix, most of users only rate a few products. When using cross-validation to evaluate models, it is very likely that the training data may include some users who do not have any ratings. Are there any methods to automatically exclude those users in evaluationScheme()?

# simulate matrix with 1000 users and 100 movies
m <- matrix(nrow = 2000, ncol = 100)

# simulated ratings (5% of the data)
m[sample.int(100 * 2000, 10000)] <- ceiling(runif(1000, 0, 5))

# remove users with no ratings
noratings <- which(rowSums(!is.na(m)) == 0)
m <- m[-noratings, ]

# convert into a realRatingMatrix
r <- as(m, "realRatingMatrix")

# UBCF recommender
UB.Rec <- Recommender(r, method = "UBCF")

pred <- predict(UB.Rec, r, type = "ratings")

# evaluation
eval_sets <- evaluationScheme(r,
                              method = "cross-validation",
                              k = 4,
                              given = 1)

eval_result <- evaluate(eval_sets, method = "UBCF")

# users with no ratings in the known data
known <- getData(eval_sets, "known")

known@data[which(rowSums(known) == 0)]
mhahsler commented 2 years ago

Thank you! This can be a problem so I reopen this issue. I think in my examples I make sure that I only use users with more than given items. Since you use given = 1, you would need to make sure that all users have at least 2 ratings. Here is the changed code that should work:

# simulate matrix with 1000 users and 100 movies
m <- matrix(nrow = 2000, ncol = 100)

# simulated ratings (5% of the data)
m[sample.int(100 * 2000, 10000)] <- ceiling(runif(1000, 0, 5))

### FIXME: needs dimnames
dimnames(m) <- list(seq(nrow(m)), seq(ncol(m)))

### FIXME: check number of ratings
# remove users with no ratings
not_enough_ratings <- which(rowSums(!is.na(m)) < 2)
m <- m[-not_enough_ratings, ]

# convert into a realRatingMatrix
r <- as(m, "realRatingMatrix")

# UBCF recommender
UB.Rec <- Recommender(r, method = "UBCF")

pred <- predict(UB.Rec, r, type = "ratings")

# evaluation
eval_sets <- evaluationScheme(r,
  method = "cross-validation",
  k = 4,
  given = 1,
  goodRating = 3)

eval_result <- evaluate(eval_sets, method = "UBCF")

# users with no ratings in the known data
known <- getData(eval_sets, "known")

known@data[which(rowSums(known) == 0)]

I will work on the code to check for this and either fix the issue by dropping users with not enough ratings or producing a better error message.

mhahsler commented 2 years ago

I have updated the code and it should now automatically remove users with not enough ratings.