Cannot find SPARK_JAR_REPO_URL when running Docker Graphene sample

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc

Apache License 2.0

6.74k stars 1.27k forks source link

Cannot find SPARK_JAR_REPO_URL when running Docker Graphene sample #4401

Open fhoering opened 2 years ago

fhoering commented 2 years ago

I'm trying to run the docker graphene sample with Spark.

https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/README.md

It seems like its uses a modified version of Spark: https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/Dockerfile#L76

How to get those modified JARs ? Is the modification from vanilla spark sources documented somewhere ?

glorysdj commented 2 years ago

Hi @fhoering, these jars are not ready to be public yet. You can pull the Docker image from Dockerhub to try bigdl-ppml.

docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT

please refer to https://hub.docker.com/layers/195055360/intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene/2.1.0-SNAPSHOT/images/sha256-4225bccfaf3516e7f7e8da9a01def971c32bc2de282477e56df108e845154b25?context=repo Thanks

jason-dai commented 2 years ago

@fhoering You may try the docker image that we have published (as mentioned above) while we are still working on the public release for the modifications; we'd be happy to follow up on your specific requirements if needed.

fhoering commented 2 years ago

OK. Thanks. I pulled the docker image and got the basic examples working.

My use case would be to run some of our existing Data processing and ML jobs in an enclave on SGX and possibly distributed through K8S if this can also work in a secure way.

The fact that one can verify the code that runs in an SGX enclave is an important part of the process. If the code is tempered or not public one can't actually use this for real. Don't you have a spark fork with the changes ? I suppose you comment out stuff that doesn't work like the Spark UI if this runs inside an enclave. I actually could just take the jars from the docker image and compare them on my own. So it is not really private information anyway.

glorysdj commented 2 years ago

@fhoering Thanks for the trial of bigdl-ppml.

Regarding k8s support, actually bigdl-ppml supports securely running in SGX enclaves on k8s with spark client and cluster mode (please refer to https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#run-as-spark-on-kubernetes-mode), and you can also start it with helm chart (https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene/kubernetes). Please let us know if there is anything we can help.

Regarding spark modifications, as Jason mentioned , it is still a work in progress, we will follow your requirements if needed.

fhoering commented 2 years ago

@glorysdj Thanks for the pointers. I will have a look. If you can push the spark modifications even in draft mode somewhere like a personal github fork it would be nice. Just to have a look at what needs to be changed to get this to work. No need be ready to be merged upstream. Draft is fine. Just to have look at where this is heading.

glorysdj commented 2 years ago

@fhoering We will share the modifications as soon as it's ready. We can also schedule a meeting to present the modified draft to you. If you are interested, please email me (dongjie.shi@intel.com) and I will arrange a meeting. Thanks.

fhoering commented 2 years ago

@glorysdj Any news ? I sent you an email some time ago. I would be still interested in a presentation of the changes.

glorysdj commented 2 years ago

@glorysdj Any news ? I sent you an email some time ago. I would be still interested in a presentation of the changes.

Really sorry that I missed the mail. Let's arrange a meeting, will talk to you in the mails. Thanks.