pishen / sbt-lighter

SBT plugin for Apache Spark on AWS EMR
Apache License 2.0
57 stars 15 forks source link

The ability to read cluster configuration from s3 #2

Closed kailuowang closed 7 years ago

kailuowang commented 7 years ago

Similar to what you can do when creating a cluster through the Web UI.

Note: I am working on this one.

pishen commented 7 years ago

Released in 0.6.0. I've changed some key names and align the json format to the one described here http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

paul-english commented 7 years ago

I noticed that using an S3 JSON file was replaced in this commit: https://github.com/pishen/sbt-emr-spark/commit/d4947b3afb86e99390e8e896bec2add1fa99a514

I like the EmrConfig object but in our case the json file is generated elsewhere and used by multiple clusters. If we were to use EmrConfig we'd be stuck always having to match our config to the s3 file anyways. Are you open to reintroducing this config to go along with the EmrConfig class?

kailuowang commented 7 years ago

an alternative is to use my fork which kept that functionality. https://github.com/kailuowang/sbt-emr-spark which is released under a different org.

paul-english commented 7 years ago

I saw your fork, thanks for the recommendation. I'll probably use that in the near term, but just figured this is still a useful feature to bring back upstream.

pishen commented 7 years ago

Will try to get this back if possible.

pishen commented 7 years ago

@log0ymxm The feature is reintroduced in 0.11.0, EmrConfig can now parse a JSON array, or read the JSON config directly from S3:

import sbtemrspark.EmrConfig
sparkEmrConfigs := Some(
  EmrConfig
    .parseJsonFromS3("s3://your-bucket/your-config.json")(sparkS3ClientBuilder.value)
    .right
    .get
)

ref: https://github.com/pishen/sbt-emr-spark#use-emrconfig-to-configure-the-applications

paul-english commented 7 years ago

Thanks for reintroducing this. Looks like the interface for using this is a great choice.