Benchmark results - Githubissues

Thanks for your interest in our benchmarking results and our paper!

In our benchmark, actually, it is almost always the case that the evaluated approach either terminates or gets timeout for every instance within one setting (one network + training method + epsilon combination is one setting). To distinguish these two outcomes, you can look at our page here: https://sokcertifiedrobustness.github.io/benchmark/. For Deterministic Verification Approaches (Probabilistic ones do not time out), the "Full Results" tab records both the certified accuracy/radius and the average running time. If one approach always achieves "60.00s" running time, since the time limit is exactly 60s, this approach always gets timeout in this setting. Otherwise, it almost never timeout. This brings a qualitative estimation of the efficiency and tightness of different approaches.

To regenerate the tables with clear distinction between "timeout" and "loose bound", you can also write a script following https://github.com/sokcertifiedrobustness/certified-robustness-benchmark/blob/master/experiments/data_analyzer.py to analyze the raw evaluation data stored at https://github.com/sokcertifiedrobustness/certified-robustness-benchmark/tree/master/experiments/data_old. Thanks!

Best, Linyi

sokcertifiedrobustness / VeriGauge-deprecated

Benchmark results #1