twosixlabs / armory

ARMORY Adversarial Robustness Evaluation Test Bed
MIT License
174 stars 67 forks source link

Carla metrics bug fix #1883

Closed swsuggs closed 1 year ago

swsuggs commented 1 year ago

Mike's description of the error from slack:

it seems that per example metrics are always collected for OD scenarios even if metric -> record_metric_per_sample is False. In addition, with the exception of AP_per_class, the other metrics no longer report the mean statistic even if metric -> mean is True.

This is essentially because the GlobalMeter class used for @populationwise metric functions assumes that the functions already return the mean over all samples. This is true for AP_per_class, but not the others.

Easy fix: just have the metric functions return the mean over all samples. This seems to be what the writer of this code (David) expected.

But that leaves questions about why the carla OD functions are marked @populationwise in the first place and whether a user would ever want to see per-image results. In that case, should this be controllable from the config, or should we just return both?

swsuggs commented 1 year ago

The present change simply returns both mean and per-image. @lcadalzo Please weigh in, if you'd like, on what the behavior of this should be.

lcadalzo commented 1 year ago

I believe the only OD metrics that need be populationwise are the AP ones, since mAP must receive all labels/predictions as input at once at the end of the scenario. It seems like a very messy and confusing workaround to add a new set of mean and per_image params as metric kwargs.

The cleanest solution, unless there's something here I'm missing, would be as follows: treat the non-AP metrics like we do other "normal" ones (e.g. for image classification), i.e. decorate them as @batchwise. Let the record_metric_per_sample and means params in the metric config determine what gets outputted, as was designed and as occurs for other metrics