h2oai / sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster
https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html
Apache License 2.0
967 stars 359 forks source link

RSparkling: improve handling of Sparkling Water package dependencies #5134

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Right now, the file https://github.com/h2oai/sparkling-water/blob/master/r/rsparkling/R/package.R enforces a specific version of Sparkling Water package.

However, we should:

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: Let's investigate using the "versions" R package to install particular versions of the h2o package.

exalate-issue-sync[bot] commented 1 year ago

Michal Malohlava commented: For "Sparkling Water Spark package (i.e., the Scala binary dependency) configuration the proposals are:

  1. rsparkling::set_sparkling_water_version("1.6.7") OR
  2. sc <- spark_connect(master = "local", extensions = rsparkling_extensions(sparkling_water_version = "2.0.0"))
exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: [~accountid:557058:389d9607-5bd8-4611-8c6a-755fe9295223] So it's possible to set the SW package version simply using the existing sparklyr::spark_connect function? If so, I'd say that's probably better than adding a new function and line of code (so #2 is better). For #1, what would happen if someone did not execute that line, would there be a default SW version that would be used?

exalate-issue-sync[bot] commented 1 year ago

Michal Malohlava commented: I prefer version #2.

I would prefer having no default - in h2o_context we can check if spark was started with Sparkling Water extension. If the SW extension is missing, we simply navigate user to launch spark with the SW extension.

The extension code has to also check for compatibility with Spark version (minor issue).

exalate-issue-sync[bot] commented 1 year ago

Navdeep commented: This seems to be a bit more difficult then necessary. Passing a sparkling_water_version to package.R does not seem trivial based on the design of sparklyr.

exalate-issue-sync[bot] commented 1 year ago

Jan Gorecki commented: This would allow to set expected SW version with "rsparkling.sparklingwater.version" option, and when not defined it will set SW version based on spark version. {code} spark_dependencies <- function(spark_version, scala_version, ...) { sw_version = getOption("rsparkling.sparklingwater.version") if (is.null(sw_version)) { spark_v = as.package_version(spark_version) spark_v = paste(spark_v$major, spark_v$minor, sep=".") version_map = c("1.6" = "1.6.7", "2.0" = "2.0.0") # add more items here if (!spark_v %in% names(version_map)) stop("Sparkling Water does not support Spark in version", spark_v) sw_version = version_map[[spark_v]] } sparklyr::sparkdependency(packages = sprintf( c("ai.h2o:sparkling-water-core%s:%s", "ai.h2o:sparkling-water-ml%s:%s", "ai.h2o:sparkling-water-repl%s:%s"), scala_version, sw_version )) } {code}

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: SW-209 Assignee: Navdeep Gill Reporter: Michal Malohlava State: Resolved Fix Version: 2.0.1 Attachments: N/A Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/sparkling-water/pull/116

hasithjp commented 1 year ago

JIRA Issue Migration Info Cont'd

Jira Issue Created Date: 2016-09-25T19:06:41.037-0700