hashicorp / nomad-spark

DEPRECATED: Apache Spark with native support for Nomad as a scheduler
44 stars 16 forks source link

Add better support for go-getter urls #28

Open tantra35 opened 5 years ago

tantra35 commented 5 years ago

This feature will be very helfull when task launched in cluster mode

cgbaker commented 5 years ago

Can you give an example of the type of support that is missing right now?

tantra35 commented 5 years ago

@cgbaker For example

s3::https://s3.amazonaws.com/bucket/foo or git::ssh://git@example.com/foo/bar

Spark on nomad now support only http links, and its not possible to upload scripts to private s3 buckets, where this can be usefull demonstrate follow script.

@echo off
setlocal EnableDelayedExpansion

set PYSCRIPT=analizelog.spark.7.py
set PYSCRIPTS3BACKET=.....

set AWS_ACCESS_KEY_ID=.....
set AWS_SECRET_ACCESS_KEY=....

aws s3 cp ./%PYSCRIPT% s3://%PYSCRIPTS3BACKET%/%PYSCRIPT%
set l_scitpurl="https://%PYSCRIPTS3BACKET%.s3.eu-central-1.amazonaws.com/%PYSCRIPT%"

set JAVA_HOME=c:\Java\jdk1.8.0_181
set NOMAD_ADDR=....
d:\spark-2.3.4-bin-nomad\bin\spark-submit.cmd^
  --nomad-template ^
  --master nomad^
  --conf spark.nomad.region=....^
  --conf spark.nomad.datacenters=spark^
  --conf spark.executor.instances=50^
  --conf spark.executor.memory=7g^
  --conf spark.driver.memory=10g^
  --conf spark.driver.maxResultSize=4g^
  --conf spark.hadoop.fs.s3a.endpoint=s3.amazonaws.com^
  --conf spark.hadoop.fs.s3a.access.key=%AWS_ACCESS_KEY_ID%^
  --conf spark.hadoop.fs.s3a.secret.key=%AWS_SECRET_ACCESS_KEY%^
  --conf spark.eventLog.enabled=true^
  --conf spark.eventLog.dir=file:/spark-history/^
  --deploy-mode cluster^
  --monitor-until submitted^
  --name somenameaonalizer.access-%PYSCRIPT% !l_scitpurl!

Today we must upload scripts to public accessible s3 buckets

cgbaker commented 5 years ago

thanks for the details, @tantra35