Extend leaderboard to account for design robustness

Thanks for suggesting this! Yes, in practice it is important for the distribution of devices that result from fabrication to perform well, e.g. the nominal design and eroded/dilated versions.

It is possible to set up robust optimization with the challenges as they are. A user would simply need to provide the eroded/nominal/dilated designs as batch of params, and then construct a new loss function which rewards robustness (or use minimax optimization or some other scheme). Although it would be possible to set up a challenge which internally generates e.g. eroded design from params, it would be less flexible and make it difficult to experiment with different versions of the design-perturbing function. For this reason, I'm currently inclined to leave the challenges themselves unchanged.

What would be useful is a demonstration of robust optimization. In general, perhaps a re-do of the gym documentation is warranted. Currently, there are notebooks which both introduce challenges and carry out optimization; instead, maybe there should be notebooks which introduce challenges only, and then notebooks which demonstrate optimization. There could be simple and advanced versions of these, with robust optimization being an advanced use case.

Turning to the leaderboard: I think robustness would be a great addition. In the validation setting, we don't care about differentiability of the design-perturbing function, and so it should be easier to settle on an implementation. I am imagining that the robust_eval_metric is just the worst-case eval_metric across nominal, eroded, and dilated versions. Instead of points on the leaderboard plots, we could then have lines which connect this worst-case and nominal metric.

@aadityacs let me know if you are interested in any of these. It would definitely be great to have an example exercising robust optimization, and to have robust devices rightfully elevated on the leaderboards.

invrs-io / leaderboard

Extend leaderboard to account for design robustness #26