microsoft / MLOS

MLOS is a project to enable autotuning for systems.
https://microsoft.github.io/MLOS
MIT License
141 stars 66 forks source link

mlos_bench: implement optional early abort logic for time based trials #542

Open bpkroth opened 1 year ago

bpkroth commented 1 year ago

Further generalizing this via async telemetry collection during the process might be nice too.

bpkroth commented 11 months ago

Some additional notes:

In cases of throughput or latency based benchmarks, it's not totally clear how to detect whether a specific trial is worse than a previous one, since some trial could theoretically speed up later or some such.

But, for raw time based ones, what we could do, would be to track the worst value seen so far, and then abort if we exceed that. To do that, we'd need some additional metadata that this benchmark was in fact seeking to minimize wallclock time.

What's tricky is how we incorporate metrics from that. Imagine for instance that you wanted to explain why some params/trials were bad. But in aborting some trials, you give up on gathering that data.

Moreover, we can't actually store a real time value for that trial, since we abort it early. Instead we need to store it in the DB as "ABORTED" or somesuch and then each time we train the optimizer fabricate a value for it. Likely $W+\epsilon$ where $W$ is the worst value seen up until that point (i.e., serially examining historical trial data).

bpkroth commented 9 months ago

Per discussions, we need: