radanalyticsio / openshift-spark

72 stars 83 forks source link

S3 object storage connectors #60

Closed NecromuncherDev closed 6 years ago

NecromuncherDev commented 6 years ago

With growing interest in using object storage (ceph, aws, minio, etc.) via s3a/n api-s, rises the question of implementing those features into this image.

Will this (including the appropriate jars in this image) be a desired thing?

rimolive commented 6 years ago

@ThatBeardedDude, did you take a look at our S3 Source Example and Ceph Source Example? Let us know if that's what you are talking about

NecromuncherDev commented 6 years ago

I took a look at both, both failing at some point for some reason. The Ceph Nano (which was the more promising of the two, for me) failed as early as I got to set up the notebook pod. It failed spinning up due to an error related to being unable to get replication controller...

Nevertheless, I do belive that the very jars related to connecting to an object storage might be of help.

elmiko commented 6 years ago

i think this is a good suggestion @ThatBeardedDude, and we should consider adding the jars. depending on how you are using these images, spark provides a very convenient method for injecting jar files into your cluster.

for example, if you have a driver application that will speak to the spark cluster produced by these images, you could pass the --packages <some package> command to the spark-submit and that would instruct the cluster to download those files into the executors (workers). something like this might work in the interim, depending on your use case.