intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
11 stars 3 forks source link

[BigDL2.0 k8s] cluster mode remain issues #17

Open Le-Zheng opened 2 years ago

Le-Zheng commented 2 years ago

k8s cluster mode remain issues:

  1. pytorch jep requires to specifyexport PYTHONHOME=/usr/local/envs/pytf1 on driver. On cluster mode, it can not be set in the driver pod. The ks8 image has two python envs. We can not hard code PYTHONHOME in image.
  2. Tfpark on cluster mode throws ModuleNotFoundError: No module named 'nets'. The k8s image has already cloned slim models and set PYTHONPATH as opt/models/research/slim:$PYTHONPATH. In client mode, it can run successfully.
  3. orca openvino example throws RuntimeError: The support of IR v4 has been removed from the product. Please, convert the original model using the Model Optimizer which comes with this version of the OpenVINO to generate supported IR version. It seems the code can not support openvino isntalled by conda install openvino-ie4py-ubuntu18 -c intel