hashicorp / nomad-spark

DEPRECATED: Apache Spark with native support for Nomad as a scheduler
44 stars 16 forks source link

[question] [feature-request] Passing vault-token to spark-submit #2

Closed ryanmickler closed 7 years ago

ryanmickler commented 7 years ago

I've been using the (wonderful) spark-nomad featureset, and I've hit a bit of a stumbling block. I want to be able to pass a vault token to my job to enable policy blocks in the template file. for example, my template

job "template" {
    group "executor-group-name" {
        task "executor-task-name" {
            # grant this task the permission to access certain policies
            # this will hopefully inject a VAULT_TOKEN environment var into each jvm executor
            vault {
                policies = ["s3-databucket-readonly"]
            }
            meta {
                "spark.nomad.role" = "executor" 
            } 
        } 
    }
}

And I schedule my job via

/usr/local/spark/bin/spark-submit --class com.demo.spark.ReadDataFromS3 \
    --master nomad:http://nomad.service.consul:4646 \
    --conf spark.nomad.sparkDistribution=local:///usr/local/spark   \
    --conf spark.nomad.datacenters=dc1  \
    --conf spark.nomad.job.template=/home/ubuntu/test.template \
    /opt/deployments/sparkjobs-assembly-1.0.jar

However, I couldn't see an option to include a VAULT_TOKEN into this method. Perhaps I can suggest something like

--conf spark.nomad.vaultToken=<TOKEN>
barnardb commented 7 years ago

Hi @ryanmickler, I'm not sure I understand what you mean by "an option to include a VAULT_TOKEN into this method".

With the job template (once converted to JSON) and spark-submit invocation you specified, when Nomad runs an executor, it should ask Vault for a token and pass it to the executor in the VAULT_TOKEN environment variable, so you don't need to pass a specific token for the executors to use. Note that this means only the executors—and not the driver application JVM where your main method runs—will have VALUT_TOKEN environment variables set by Nomad. So if you only need the tokens on the executors, you don't need to pass a specific Vault token, since Nomad will get it for you.

If you also need a token in the driver application, then there are two approaches.

With the default "client" deploy mode that your current spark-submit invocation is using, spark-submit starts the application driver JVM process directly (instead of having Nomad start it), so Nomad can't get the token for you, and you will need to pass in a token. Since the application driver JVM process will inherit spark-submit's environment variables, you can simply set the VAULT_TOKEN environment variable when invoking spark-submit.

Alternatively, you can use the "cluster" deploy mode (by passing --deploy-mode cluster to spark-submit), where Nomad will start the application driver in the nomad cluster. You can modify your job template to give the vault stanza to the driver or the whole job to have Nomad procure tokens for the driver tasks or to driver and executor tasks, respectively.

Do any of those routes meet your use case? Or have I misunderstood what you're trying to achieve? Or are things not working as expected?

ryanmickler commented 7 years ago

Hi @barnardb, thanks for your reply.

Ah, let me try that, I wasn't aware that spark-submit will pass the VAULT_TOKEN env variable to the nomad run command.

ryanmickler commented 7 years ago

Yup, running VAULT_TOKEN=<> spark-submit ... works just fine, thanks for your help!