Open bpkroth opened 1 year ago
Some additional notes:
In cases of throughput or latency based benchmarks, it's not totally clear how to detect whether a specific trial is worse than a previous one, since some trial could theoretically speed up later or some such.
But, for raw time based ones, what we could do, would be to track the worst value seen so far, and then abort if we exceed that. To do that, we'd need some additional metadata that this benchmark was in fact seeking to minimize wallclock time.
What's tricky is how we incorporate metrics from that. Imagine for instance that you wanted to explain why some params/trials were bad. But in aborting some trials, you give up on gathering that data.
Moreover, we can't actually store a real time value for that trial, since we abort it early. Instead we need to store it in the DB as "ABORTED" or somesuch and then each time we train the optimizer fabricate a value for it. Likely $W+\epsilon$ where $W$ is the worst value seen up until that point (i.e., serially examining historical trial data).
Per discussions, we need:
abort
run
phasestatus
or telemetry
phase that includes commands used to asynchronously poll status of a run
phase (or should it also support other phases?) in order to feed in-progress metrics back into the system and allow specifying that one of those (maybe just an implicit elapsed time, but probably not if sometimes the db needs to be reloaded for instance and other times it doesn't so the run phase overall may take longer on occassion even if the actual benchmark portion sometimes doesn't)
Further generalizing this via async
telemetry
collection during the process might be nice too.