intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
18 stars 4 forks source link

init on k8s will throw error if bigdl is not installed by pip #99

Open qiuxin2012 opened 3 years ago

qiuxin2012 commented 3 years ago

Command is

$SPARK_HOME/bin/spark-submit \
    --master $RUNTIME_SPARK_MASTER \
    --deploy-mode client \
    --name analytics-zoo-ncf \
    --conf spark.executor.instances=$RUNTIME_EXECUTOR_INSTANCES \
    --conf spark.driver.host=$RUNTIME_DRIVER_HOST \
    --conf spark.driver.port=$RUNTIME_DRIVER_PORT \
    --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
    --conf spark.kubernetes.executor.deleteOnTermination=false \
    --conf spark.driver.memory=8g \
    --executor-cores 5 \
    --total-executor-cores 5 \
    --executor-memory 128G \
    --conf spark.network.timeout=10000000 \
    --conf spark.executor.heartbeatInterval=10000000 \
    --conf spark.executor.extraClassPath=/ppml/trusted-big-data-ml/work/analytics-zoo-0.12.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.13.0-spark_3.1.2-0.12.0-SNAPSHOT-jar-with-dependencies.jar,/ppml/trusted-big-data-ml/work/bigdl-jar-with-dependencies.jar \
    --conf spark.driver.extraClassPath=/ppml/trusted-big-data-ml/work/analytics-zoo-0.12.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.13.0-spark_3.1.2-0.12.0-SNAPSHOT-jar-with-dependencies.jar,/ppml/trusted-big-data-ml/work/bigdl-jar-with-dependencies.jar \
    --properties-file /ppml/trusted-big-data-ml/work/analytics-zoo-0.12.0-SNAPSHOT/conf/spark-analytics-zoo.conf \
    --jars /ppml/trusted-big-data-ml/work/analytics-zoo-0.12.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.13.0-spark_3.1.2-0.12.0-SNAPSHOT-jar-with-dependencies.jar,/ppml/trusted-big-data-ml/work/bigdl-jar-with-dependencies.jar \
    --py-files /ppml/trusted-big-data-ml/work/analytics-zoo-0.12.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.13.0-spark_3.1.2-0.12.0-SNAPSHOT-python-api.zip,/ppml/trusted-big-data-ml/work/bigd-python-api.zip \
    --verbose \
    --files /ppml/trusted-big-data-ml/work/data/ml-1m/ratings_new.dat.2 \
    /ppml/trusted-big-data-ml/work/data/ncf/ncf-dataframe.py

error is

Traceback (most recent call last):
  File "/ppml/trusted-big-data-ml/work/data/ncf/ncf-dataframe.py", line 24, in <module>
    cores=4) # run in local mode
  File "/ppml/trusted-big-data-ml/work/analytics-zoo-0.12.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.13.0-spark_3.1.2-0.12.0-SNAPSHOT-python-api.zip/zoo/orca/common.py", line 244, in init_orca_context
  File "/ppml/trusted-big-data-ml/work/analytics-zoo-0.12.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.13.0-spark_3.1.2-0.12.0-SNAPSHOT-python-api.zip/zoo/common/nncontext.py", line 257, in init_spark_on_k8s
  File "/ppml/trusted-big-data-ml/work/analytics-zoo-0.12.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.13.0-spark_3.1.2-0.12.0-SNAPSHOT-python-api.zip/zoo/util/spark.py", line 268, in init_spark_on_k8s
  File "/ppml/trusted-big-data-ml/work/analytics-zoo-0.12.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.13.0-spark_3.1.2-0.12.0-SNAPSHOT-python-api.zip/zoo/util/utils.py", line 145, in get_zoo_bigdl_classpath_on_driver
AssertionError: Cannot find BigDL classpath, please check your installation
qiuxin2012 commented 3 years ago

export BIGDL_CLASSPATH to work around.