Closed Adamage closed 3 years ago
You can try to export PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON and pass spark.executorEnv.PYTHONHOME using --conf spark.executorEnv.PYTHONHOME=...
@cyita Hello. But is that the issue? I am using the correct Python from my zipped environment - it has all the libraries, it allows to import bigdl, zoo, tensorflow, keras, pyspark, it's all there. (the env vars you mentioned I have already set)
But when I try to run init_engine()
it has some kind of a problem with JavaPackage.
Has this problem been identified? Is JavaCreator.instance(bigdl_type, gateway)
looking for some Java classes I might not have visible in CLASSPATH?
Ok this is for later generations - whoever stumbles upon this JavaPackage thing. There is an easy solution, just make sure the BigDL uber jar with dependencies is visible by Spark.
spark.jars
property needs to have this appended at the end.
I was able to use IBM Watson, Livy2, Spark and BigDL - Spark launched executors and performed BigDL estimators
Great work. Yes, we need bigdl jar in the spark.jars
.
https://bigdl-project.github.io/master/#PythonUserGuide/run-without-pip/#run-with-virtual-environment-in-yarn may help.
'JavaPackage' object is not callable is because python couldn't find the java class.
@Adamage , could you share your Livy setup to launch BigDL job? I have some issues launching even simplest example using AWS EMR Notebook (https://github.com/intel-analytics/BigDL/issues/7764), and so far it fails. Maybe I can reuse your livy and/or notebook configuration?
Hello everyone.
I am struggling to correctly configure BigDL on our Hadoop/Spark setup. Normally we use SparkML + Livy2 in a Jupyter Notebook to ask Yarn for drivers, executors etc.
As I understand, when I am already inside a pySpark container, PySpark is already loaded, and actually I should be using the one inside BigDL "home" library directory?
Some more details of what I am doing:
Some conflicts are reported when I import the libraries in Jupyter cells, for example:
pyenv-3.7.10-v6.zip/3.7.10/envs/pyenv-3.7.10-v6/lib/python3.7/site-packages/zoo/util/engine.py:42: UserWarning: Find both SPARK_HOME and pyspark. You may need to check whether they match with each other. SPARK_HOME environment variable is set to: ., and pyspark is found in: /cdh/opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p4460.8174152/lib/spark/python/lib/pyspark.zip/pyspark/__init__.py. If they are unmatched, you are recommended to use one source only to avoid conflict. For example, you can unset SPARK_HOME and use pyspark only. warnings.warn(warning_msg)
Livy Setup is as such, that I provide a zipped venv stored in HDFS via
--archives
. The zipped venv has inside bigdl, pyspark, analytics-zooThe result is that I have the popular error "JavaPackage" not callable.
What am I doing wrong?