kestra-io / plugin-spark

Apache License 2.0
3 stars 2 forks source link

Setting SPARK_HOME in R Spark example #49

Open shrutimantri opened 8 months ago

shrutimantri commented 8 months ago

Expected Behavior

-

Actual Behaviour

In the R Spark flow example provided here: https://kestra.io/plugins/plugin-spark/tasks/io.kestra.plugin.spark.RSubmit What should be SPARK_HOME set as in the env variable?

This runs in a Docker runner, so its unclear as to what should be set as SPARK_HOME. Once we know how the flow should exactly be, I can make changes in the documentation accordingly.

Steps To Reproduce

N/A

Environment Information

Example flow

Flow as provided here: https://kestra.io/plugins/plugin-spark/tasks/io.kestra.plugin.spark.RSubmit

anna-geller commented 8 months ago

Sys.getenv("SPARK_HOME") should normally resolve to /opt/bitnami/spark if you're running this script with a default bitnami/spark image

anna-geller commented 8 months ago

the default example

id: "r_submit"
type: "io.kestra.plugin.spark.RSubmit"
runner: DOCKER
docker:
  networkMode: host
  user: root
master: spark://localhost:7077
mainScript: |
  library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
  sparkR.session()

  print("The SparkR session has initialized successfully.")

  sparkR.stop()

from here fails with error: Exception in thread "main" java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory

image