Open fhoering opened 2 years ago
Hi @fhoering, these jars are not ready to be public yet. You can pull the Docker image from Dockerhub to try bigdl-ppml.
docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
@fhoering You may try the docker image that we have published (as mentioned above) while we are still working on the public release for the modifications; we'd be happy to follow up on your specific requirements if needed.
OK. Thanks. I pulled the docker image and got the basic examples working.
My use case would be to run some of our existing Data processing and ML jobs in an enclave on SGX and possibly distributed through K8S if this can also work in a secure way.
The fact that one can verify the code that runs in an SGX enclave is an important part of the process. If the code is tempered or not public one can't actually use this for real. Don't you have a spark fork with the changes ? I suppose you comment out stuff that doesn't work like the Spark UI if this runs inside an enclave. I actually could just take the jars from the docker image and compare them on my own. So it is not really private information anyway.
@fhoering Thanks for the trial of bigdl-ppml.
Regarding k8s support, actually bigdl-ppml supports securely running in SGX enclaves on k8s with spark client and cluster mode (please refer to https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#run-as-spark-on-kubernetes-mode), and you can also start it with helm chart (https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene/kubernetes). Please let us know if there is anything we can help.
Regarding spark modifications, as Jason mentioned , it is still a work in progress, we will follow your requirements if needed.
@glorysdj Thanks for the pointers. I will have a look. If you can push the spark modifications even in draft mode somewhere like a personal github fork it would be nice. Just to have a look at what needs to be changed to get this to work. No need be ready to be merged upstream. Draft is fine. Just to have look at where this is heading.
@fhoering We will share the modifications as soon as it's ready. We can also schedule a meeting to present the modified draft to you. If you are interested, please email me (dongjie.shi@intel.com) and I will arrange a meeting. Thanks.
@glorysdj Any news ? I sent you an email some time ago. I would be still interested in a presentation of the changes.
@glorysdj Any news ? I sent you an email some time ago. I would be still interested in a presentation of the changes.
Really sorry that I missed the mail. Let's arrange a meeting, will talk to you in the mails. Thanks.
I'm trying to run the docker graphene sample with Spark.
https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/README.md
It seems like its uses a modified version of Spark: https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/Dockerfile#L76
How to get those modified JARs ? Is the modification from vanilla spark sources documented somewhere ?