microsoft / MLOS

MLOS is a project to enable autotuning for systems.
https://microsoft.github.io/MLOS
MIT License
138 stars 65 forks source link

mlos_bench: improved failure handling #464

Open bpkroth opened 1 year ago

bpkroth commented 1 year ago

Right now, if a trial fails, we simply continue.

This seems reasonable for benchmark environment failures, however, for lower level failures (e.g. VM, OS) that the leaf environments rely upon it can hide config errors that are only visible at runtime that should be addressed.

For instance, an ARM template error will simply loop with 400 errors until the max_iterations count is reached. This isn't helpful.

A couple of thoughts:

bpkroth commented 11 months ago

See also #523