Closed javierluraschi closed 6 years ago
ml_recommend()
takes a ALS model object, not a pipeline model, so you'd have to extract the appropriate stage from the pipeline if you want to use ml_recommend()
:
fitted_pipeline %>%
ml_stage("als") %>%
ml_recommend()
# Source: table<sparklyr_tmp_cedb5c8c865e> [?? x 4]
# Database: spark_connection
# user_index recommendations product_index rating
# <int> <list> <int> <dbl>
# 1 1 <list [2]> 1 4.86
# 2 2 <list [2]> 1 3.97
# 3 0 <list [2]> 2 3.89
We can do better with that error message, though!
Thanks for the clarification on ml_recommend and the ml_als pipeline example. It's helpful to someone like me who is inexperienced with Spark ML pipelines.
I am not able to get ml_fit to execute. My pipeline looks like this:
pipeline <- ml_pipeline(sc) %>%
ft_string_indexer(input_col="ProductID",output_col="product_index") %>%
ft_string_indexer(input_col="UserID",output_col="user_index") %>%
ml_als(rating_col="Score",user_col="user_index",item_col="product_index",max_iter=10)
pipeline
fitted_pipeline <- ml_fit(pipeline,reviews)
My data looks like this:
Source: lazy query [?? x 4] Database: spark_connection Id ProductId UserId Score
@campanell check the spelling of your column names e.g. ProductId
vs ProductID
@kevinykuo good to know! I wonder if you would want to consider having an S3 method in ml_recommend
that does take a pipeline and automatically extracts the stage; that said, I'm not sure this will be desirable for all ml_recommend
operations. Otherwise, looks like we can close this one.
I'd like to stay away from generalizing specialized helper functions to pipeline objects, because it would require inspection of the pipeline and making assumptions on which stage to extract (e.g. the ALS routine could be in any position in the pipeline and there could be more than one), which in turn could lead to unexpected behavior.
Closing this, but @campanell feel free to let me know if you run into further issues!
Thanks for the instructions. So embarrassed about the typos. I should not be afraid of the Boogeyman (ie Scala error messages), but I still am.
I was able to get to ml_recommend. However, I would like to know where in the pipeline, do I use ft_index_to_string to covert the product_index and user_index back to ProductId and UserId?
Was able to get the values from ft_index_to string back to the recommend data frame.
prod_index <- fitted_pipeline %>%
ml_stage(1) %>%
ml_labels()
user_index <- fitted_pipeline %>%
ml_stage(2) %>%
ml_labels()
recommend_1 <- ft_index_to_string(recommend,input_col="product_index",output_col="ProductId_" ,
labels = prod_index)
recommend_2 <- ft_index_to_string(recommend_1,input_col="user_index",output_col="UserId_" ,
labels = user_index)
head(recommend_2)
Thanks so much for all your help. I am now able to do collaborative filtering with sparklyr.
So far so good, however: