qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark
http://sparklens.qubole.com
Apache License 2.0
567 stars 138 forks source link

Scalability aware Autoscaling with Sparklens #22

Open beriaanirudh opened 6 years ago

beriaanirudh commented 6 years ago

With repetitive workloads (such as ETLs), Sparklens can leverage the knowledge of resource-requirements from previous runs of a spark application, and use it to autoscale executor requirements such that the same latency of spark application is met with the minimum executors needed at every job. This provided all other configurations of the application remain same.

This can be done by the following:

  1. One the first run on an app, the Sparklens-json will contain all the information regarding this need. We will now show graphs showing the actual executors scaling Vs the minimum executor autoscaling in which the same latency of app can be achived. This minimum number is per-job-basis for the application.
  2. When the same app is run again, user can pass the Sparklens-json from the previous run, and another configuration to let Sparklens dictate autoscaling of executors for this run.