sparklyr / sparklyr

R interface for Apache Spark
https://spark.rstudio.com/
Apache License 2.0
949 stars 308 forks source link

ml_als code example doesn't work #2078

Open kecbenson opened 5 years ago

kecbenson commented 5 years ago

I'm having trouble using the code example in the ml_als help documentation. I'm using sparklyr version 1.0.1.9004 with RStudio version 1.2.1555. The code example that doesn't run on my machine is:

library(sparklyr)
sc <- spark_connect(master = "local")
movies <- data.frame(
    user   = c(1, 2, 0, 1, 2, 0),
    item   = c(1, 1, 1, 2, 2, 0),
    rating = c(3, 1, 2, 4, 5, 4)
)
movies_tbl <- sdf_copy_to(sc, movies)
model <- ml_als(movies_tbl, rating ~ user + item)

When I run this, I get a long series of Spark error messages, starting with: "Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 29.0 failed 1 times, most recent failure: Lost task 3.0 in stage 29.0 (TID 172, localhost, executor driver): java.lang.StackOverflowError at java.io.ObjectInputStream$PeekInputStream.read(Unknown Source)"

Here is the output of utils::sessionInfo(): "R version 3.6.0 (2019-04-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets [6] methods base

other attached packages: [1] sparklyr_1.0.1.9004

loaded via a namespace (and not attached): [1] Rcpp_1.0.1 pillar_1.4.2 compiler_3.6.0
[4] dbplyr_1.4.2 remotes_2.1.0 prettyunits_1.0.2 [7] r2d3_0.2.3 base64enc_0.1-3 tools_3.6.0
[10] pkgload_1.0.2 digest_0.6.20 pkgbuild_1.0.3
[13] jsonlite_1.6 memoise_1.1.0 tibble_2.1.3
[16] pkgconfig_2.0.2 rlang_0.4.0 DBI_1.0.0
[19] cli_1.1.0 rstudioapi_0.10 yaml_2.2.0
[22] parallel_3.6.0 withr_2.1.2 dplyr_0.8.3
[25] httr_1.4.0 fs_1.3.1 desc_1.2.0
[28] generics_0.0.2 htmlwidgets_1.3 askpass_1.1
[31] rappdirs_0.3.1 devtools_2.0.2 rprojroot_1.3-2
[34] tidyselect_0.2.5 glue_1.3.1 forge_0.2.0
[37] R6_2.4.0 processx_3.4.0 sessioninfo_1.1.1 [40] callr_3.3.0 purrr_0.3.2 magrittr_1.5
[43] usethis_1.5.1 ps_1.3.0 backports_1.1.4
[46] htmltools_0.3.6 ellipsis_0.2.0.1 assertthat_0.2.1 [49] config_0.3 openssl_1.4 crayon_1.3.4
"

javierluraschi commented 5 years ago

Thanks, definitely a bug on sparklyr, at the very least we should warn that this method is now deprecated in Spark 1.6. + @kevinykuo

However, Spark 2.x and newer works properly, any chance you can upgrade your Spark version?

If you are running locally, you can rerun spark_install() to get the latest version of Spark, or spark_install(version = "2.3") which is currently the most supported version across all sparklyr extensions.

kecbenson commented 5 years ago

Ok thanks, I'll try version 2.3. Earlier I was using version 2.4.3.

kecbenson commented 5 years ago

Changing the Spark version doesn't seem to help; originally I was using 2.4.3, and then I changed the version to 2.3, 2.2, and 2.1 with the code below. They all produce similar error messages. Any other ideas? Thank you.

spark_install(version = "2.3")
sc <- spark_connect(master = "local", version = "2.3")

Output of utils::sessionInfo() below:

R version 3.6.0 (2019-04-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets [6] methods base

other attached packages: [1] sparklyr_1.0.1.9004

loaded via a namespace (and not attached): [1] Rcpp_1.0.1 pillar_1.4.2 compiler_3.6.0
[4] dbplyr_1.4.2 remotes_2.1.0 prettyunits_1.0.2 [7] r2d3_0.2.3 base64enc_0.1-3 tools_3.6.0
[10] pkgload_1.0.2 pkgbuild_1.0.3 zeallot_0.1.0
[13] digest_0.6.20 memoise_1.1.0 jsonlite_1.6
[16] tibble_2.1.3 pkgconfig_2.0.2 rlang_0.4.0
[19] cli_1.1.0 DBI_1.0.0 rstudioapi_0.10
[22] yaml_2.2.0 parallel_3.6.0 withr_2.1.2
[25] dplyr_0.8.3 httr_1.4.0 fs_1.3.1
[28] desc_1.2.0 devtools_2.0.2 generics_0.0.2
[31] htmlwidgets_1.3 vctrs_0.2.0 askpass_1.1
[34] rappdirs_0.3.1 rprojroot_1.3-2 tidyselect_0.2.5 [37] glue_1.3.1 forge_0.2.0 R6_2.4.0
[40] processx_3.4.0 fansi_0.4.0 sessioninfo_1.1.1 [43] callr_3.3.0 purrr_0.3.2 magrittr_1.5
[46] usethis_1.5.1 ps_1.3.0 backports_1.1.4
[49] htmltools_0.3.6 ellipsis_0.2.0.1 assertthat_0.2.1 [52] config_0.3 utf8_1.1.4 openssl_1.4
[55] crayon_1.3.4