hashicorp / nomad-spark

DEPRECATED: Apache Spark with native support for Nomad as a scheduler
44 stars 16 forks source link

Disable go-getter auto-unzip #11

Closed angrycub closed 6 years ago

angrycub commented 6 years ago

What changes were proposed in this pull request?

Added a parameter to set archive=false option on all of the files to download except for the spark tarball which needs to be unpacked as part of the job deployment.

How was this patch tested?

Built a cluster in AWS with terraform environment included the hashicorp/nomad project. Using the packer configuration in the file to bundle my patched spark environment, I built a cluster in AWS with terraform environment included the hashicorp/nomad project.

Ran this sample job:

spark-submit --master nomad --deploy-mode cluster --conf spark.nomad.sparkDistribution=https://angrycub-hc.s3.amazonaws.com/public/python/spark-2.1.0-bin-nomad.tgz --py-files=https://angrycub-hc.s3.amazonaws.com/public/python/jobs.zip,https://angrycub-hc.s3.amazonaws.com/public/python/libs.zip https://angrycub-hc.s3.amazonaws.com/public/python/main.py --job wordcount

Before patching, this job would automatically unarchive the zip files causing my job to be unable to load them where expected.

Discussion

Currently, go-getter being able to unzip artifacts provides some value at the expense of requiring operators to understand this nuance. This patch negates that ability in favor of preserving the expected behavior when supplying paths to zip files to flags like --py-files.