mozilla / telemetry-analysis-service

Telemetry Analysis Service
https://analysis.telemetry.mozilla.org/
Mozilla Public License 2.0
36 stars 29 forks source link

Add other types of spot instances #229

Open jezdez opened 7 years ago

jezdez commented 7 years ago

@vitillo: Analysis jobs have different requirements in terms of hardware resources; some might benefit from more cores while others might benefit from more memory. Our users would like to select the instance type so that they can use machines that fit their job well.

Note that our Spark configuration and ETL jobs have been heavily tuned over time to run on c3.4xlarge instances. Introducing new instance types would require a significant amount of manual QA.

fbertsch commented 7 years ago

I'm interested in the tuning that we've done for our Spark clusters. Do we have a write-up somewhere? I know that we set the memory configurations during bootstrap, but I'm not sure if there's anything else we do. Would be interesting to run some benchmarks on our jobs testing scale-up vs. scale-out, and find the optimal price point.

vitillo commented 7 years ago

We don't have any write up. I can tell you though that the Longitudinal job, for example, will fail if machines with less disk space or less memory are chosen. That job really stresses the cluster and has been tuned to use the minimum amount of c3.4xlarge machines necessary. I am convinced that many other jobs that have been tested with the current configuration would just fail for one reason or the other if less resources are available on the single machines. That said, it should be fine to use different instance types for new jobs.

I suspect Databricks supports (or at least it used to) a single instance type for the same reason.