Open FredrikBakken opened 3 years ago
i'd need to dive into this more because I'm sure this can be configured differently but looking at the trace we explicitly don't support fork/exec
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
Hello @eyberg!
You are absolutely right that the issue must be related to the java.lang.UNIXProcess.forkAndExec
function, which is caused by the process of running a Spark Job that spawns multiple tasks that are ran in parallel in response to Spark actions (e.g. reduce
, collect
). This functionality goes against the definition of unikernels as stated: "Unikernels are specialised single process operating systems".
A possible solution could be to have dedicated Job/Task unikernels being spawned from the original process? However, not sure if this is a wanted feature since unikernels are designed for "single process". The best would maybe be to have it all built into the Spark Worker unikernel itself so that it is within its own environment.
Reference: https://spark.apache.org/docs/latest/cluster-overview.html
i used spark maybe 2.5-3yrs ago non-unikernlized and we ran multiple workers on the same instance - i think there are 2 options but i don't know the effort/work involved:
I took a look at the ExecutorRunner and it's pretty clearly process based but https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
Hi!
I am working on trying to set up an Apache Spark Master/Worker-cluster (locally), but I am having some issues with the Worker-node. It throws the following error message when I try to submit the SparkPi example code:
To replicate my current application set up, follow these steps:
spark_3.0.0
package with:ops pkg get spark_3.0.0
spark_3.0.0.tar.gz
file from~/.ops/packages/
to~/.ops/local_packages/
spark_3.0.0
directory within~/.ops/local_packages/
, one namedsparkmaster_3.0.0
and the second namedsparkworker_3.0.0
package.manifest
file within thesparkworker_3.0.0
directory to:ops load -l sparkmaster_3.0.0 -p 8080 -p 7077 -i master
ops load -l sparkworker_3.0.0 -i worker
bin
directory of Apache Spark):./spark-submit --class org.apache.spark.examples.SparkPi --master spark://0.0.0.0:7077 --executor-memory 1G --total-executor-cores 1 ../examples/jars/spark-examples_2.12-3.0.0.jar 1000
I then did some further experimentations with the source code of the SparkPi example code (https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala). Commenting out lines 34 - 38 made it possible to run the application without problems. The edited SparkPi example looks like this:
However, this makes the application not do what it is designed to do,
map
andreduce
.It also seems that the
"Cannot run program"
part of the error message is linked to theJAVA_HOME
environment variable. Do you have any recommendations or suggestions for how to make the Worker-node run without issues?