jansel / opentuner

An extensible framework for program autotuning
http://opentuner.org/
MIT License
382 stars 112 forks source link

On configuration search, and cost #128

Open umayrh opened 5 years ago

umayrh commented 5 years ago

I've been working on using OpenTuner to tune Spark applications (https://github.com/umayrh/sketchy-polytopes/tree/master/python/sparktuner), which are notoriously hard to tune given the large number of configuration parameters, and dependence on data, program logic, and infrastructure.

I came across issue https://github.com/jansel/opentuner/issues/82, which describes why exhaustive search space isn't yet implemented in OpenTuner. Spark applications, like other large data processing programs, can take a relatively long time (say, 0.5-1hr) to run and consume significant amount of cloud memory/CPU resources. So the cost of running such programs over a large configuration search space might be prohibitive. At the same time, it's possible to limit this search space based on past experience. For such cases, perhaps the total 'cost' of the search space, instead of its size, is a better measure. Any thoughts on how we can generalize the notion of search space so we focus on being limited by prohibitively expensive search instead of exhaustive search?