Open guidiandrea opened 3 years ago
We are working on the explicit support for this scenario in https://github.com/intel-analytics/analytics-zoo/pull/4339; for now, you may do something like:
init_orca_context(cluster_mode="spark-submit", ...)
See https://github.com/intel-analytics/analytics-zoo/blob/master/pyzoo/zoo/orca/common.py#L161
We are working on this and would finish it very soon.
Hello @hkvision @jason-dai
I tried what you said but I'm getting a 'JavaPackage object is not callable' error
What might it be due to?
Thanks
Hi @guidiandrea
Since you already have a SparkSession, you need to manually upload the jar for Analytics Zoo before initializing the SparkSession. You may refer to our guide for DataBricks to do similar things in your environment: https://analytics-zoo.readthedocs.io/en/latest/doc/UserGuide/databricks.html#installing-analytics-zoo-libraries More specifically, the following paragraph in the page:
Install Analytics Zoo python environment using prebuilt release Wheel package. Click Libraries > Install New > Upload > Python Whl. Download Analytics Zoo prebuilt Wheel here. Choose a wheel with timestamp for the same Spark version and platform as Databricks runtime. Download and drop it on Databricks.
Feel free to tell us if you encounter further issues :)
Hi @guidiandrea
Since you already have a SparkSession, you need to manually upload the jar for Analytics Zoo before initializing the SparkSession. You may refer to our guide for DataBricks to do similar things in your environment: https://analytics-zoo.readthedocs.io/en/latest/doc/UserGuide/databricks.html#installing-analytics-zoo-libraries More specifically, the following paragraph in the page:
Install Analytics Zoo python environment using prebuilt release Wheel package. Click Libraries > Install New > Upload > Python Whl. Download Analytics Zoo prebuilt Wheel here. Choose a wheel with timestamp for the same Spark version and platform as Databricks runtime. Download and drop it on Databricks.
Feel free to tell us if you encounter further issues :)
Hello @hkvision, thanks for your reply.
I already installed analytics-zoo using pip in the virtual env that I'm shipping to my yarn application, and I'm loading BigDL through jars because I have Spark 2.3 so I can't install BigDL using PIP (it will automatically bring Pyspark 2.4.6)
Should I build everything from source in order to avoid collisions? The linux env is a CentOS-like.
But actually pip install analytics-zoo will also install bigdl and pyspark2.4.6, how can you only pip install analytics-zoo? If you are using Spark 2.3, I suppose you may need to use spark-submit and specify the jars when spark-submit? (for the init_orca_context code you don't need to modify anything :)
But actually pip install analytics-zoo will also install bigdl and pyspark2.4.6, how can you only pip install analytics-zoo? If you are using Spark 2.3, I suppose you may need to use spark-submit and specify the jars when spark-submit? (for the init_orca_context code you don't need to modify anything :)
Yep, I needed to modify dependencies as the analytics zoo's prebuilt wheel for spark 2.3 was trying to install pyspark 2.4 but of course that makes no sense xP
We have some released spark whls for spark 2.3: https://sourceforge.net/projects/analytics-zoo/files/zoo-py/ and probably you may have a try? cc @Le-Zheng Or you may use the spark-submit directly to play safe :)
Hi @guidiandrea
Since you already have a SparkSession, you need to manually upload the jar for Analytics Zoo before initializing the SparkSession. You may refer to our guide for DataBricks to do similar things in your environment: https://analytics-zoo.readthedocs.io/en/latest/doc/UserGuide/databricks.html#installing-analytics-zoo-libraries More specifically, the following paragraph in the page:
Install Analytics Zoo python environment using prebuilt release Wheel package. Click Libraries > Install New > Upload > Python Whl. Download Analytics Zoo prebuilt Wheel here. Choose a wheel with timestamp for the same Spark version and platform as Databricks runtime. Download and drop it on Databricks.
Feel free to tell us if you encounter further issues :)
I tried following the guide and I downloaded the correct version of the prebuilt jar with the dependencies but now I'm getting the following error
. What may be it due to? Thanks
Seems it is an issue due to the cluster 0_0? Is it brought by adding the analytics-zoo jar? If so, can you provide more details (for example the command you use to submit the jar?)
@hkvision Yep, I have this problem when adding the analytics-zoo JAR into sparkmagic configurations (we run Jupyter with JupyterHub, so when we launch the first cell a yarn application is started and named as LivySession).
I added these configurations to the sparkmagic/config.json file:
"spark.driver.extraClassPath":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar", "spark.executor.extraClassPath":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar", "spark.jars":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar",
I was able to use BigDL without Analytics-Zoo using these settings, so they're pretty much correct.
@qiuxin2012 @Le-Zheng any comments?
@guidiandrea Could you provide some information about your environment?
@hkvision Yep, I have this problem when adding the analytics-zoo JAR into sparkmagic configurations (we run Jupyter with JupyterHub, so when we launch the first cell a yarn application is started and named as LivySession).
I added these configurations to the sparkmagic/config.json file:
"spark.driver.extraClassPath":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar", "spark.executor.extraClassPath":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar", "spark.jars":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar",
I was able to use BigDL without Analytics-Zoo using these settings, so they're pretty much correct.
Could you check the size of your analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar, 217MB or 403MB? And make sure the spark.jars
you set is correct.
Hello, The size of the file is 403 MB. What do you mean by ‘check that spark.jars is correct’?
Inviato da myMail per iOS
venerdì 29 ottobre 2021, 06:46 +0100 da @. @.>: @.*** Yep, I have this problem when adding the analytics-zoo JAR into sparkmagic configurations (we run Jupyter with JupyterHub, so when we launch the first cell a yarn application is started and named as LivySession).
I added these configurations to the sparkmagic/config.json file: "spark.driver.extraClassPath":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar", "spark.executor.extraClassPath":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar", "spark.jars":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar", I was able to use BigDL without Analytics-Zoo using these settings, so they're pretty much correct. Could you check the size of your analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar, 217MB or 403MB? And make sure the spark.jars you set is correct. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .
Hello, The size of the file is 403 MB. What do you mean by ‘check that spark.jars is correct’? Inviato da myMail per iOS venerdì 29 ottobre 2021, 06:46 +0100 da @. @.>: @.*** Yep, I have this problem when adding the analytics-zoo JAR into sparkmagic configurations (we run Jupyter with JupyterHub, so when we launch the first cell a yarn application is started and named as LivySession). …
I added these configurations to the sparkmagic/config.json file: >"spark.driver.extraClassPath":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar", "spark.executor.extraClassPath":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar", "spark.jars":"/analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar", >I was able to use BigDL without Analytics-Zoo using these settings, so they're pretty much correct. Could you check the size of your analytics-zoo-bigdl_0.13.0-spark_2.3.1-0.12.0-20210908.203333-39-jar-with-dependencies.jar, 217MB or 403MB? And make sure the spark.jars you set is correct. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .
Open the environment page in spark driver's webui, my is http://xin-dev.sh.intel.com:4040/environment/, can you see the analytics-zoo's jar in Classpath Entries
?
@guidiandrea I tried to run init_orca_context in latest BigDL when sparksession is instantiated by pyspark, I got JavaPackage object is not callable
when --jars
(your spark.jars) is not provided. See issue https://github.com/intel-analytics/BigDL/issues/3351 for more details.
In your error message I found an useful Warning, the analytics-zoo's jar is skipped.
So you need to open your environment page in spark driver's webui, can check if analytics-zoo's jar is in the Classpath Entries
.
Hello,
I am trying to use Analytics-zoo in a Hadoop/YARN environment via JupyterHub/Lab where the SparkSession and context are automatically instantiated when the first cell is run.
How can I init the environment in this case?