Calculate GNN's situation-level test accuracy

To make it easier to relate ADAM accuracy to GNN accuracy it would be nice to calculate the GNN's situation-level test accuracy as opposed to only calculating the object-level test accuracy. It would be good to have those numbers for the report, especially to explain the one weird color segmentation result where ADAM test accuracy is higher than the GNN's object-level test accuracy (but lower than the GNN's situation-level test accuracy).

isi-vista / adam

Calculate GNN's situation-level test accuracy #1212