Metric.aggregate_stats and Metric.calc_metric_from_aggregate does not guarantee that each implementation returns the ndarray with correct shape. These functions must return following arrays:
aggregate_stats ... [num_stats] or [num_batches, num_stats]
calc_metric_from_aggregate ... [] or [num_batches]
But several implementations does return other shapes, even the default implementations in Metric.
This causes several wrong consequences. A serious one is bootstrapped CI never returns correct data because the inner sort() doesn't work along the correct axis (I observed this when adding a unit test for calc_confidence_interval).
Metric.aggregate_stats
andMetric.calc_metric_from_aggregate
does not guarantee that each implementation returns the ndarray with correct shape. These functions must return following arrays:aggregate_stats
...[num_stats]
or[num_batches, num_stats]
calc_metric_from_aggregate
...[]
or[num_batches]
But several implementations does return other shapes, even the default implementations in
Metric
.This causes several wrong consequences. A serious one is bootstrapped CI never returns correct data because the inner
sort()
doesn't work along the correct axis (I observed this when adding a unit test forcalc_confidence_interval
).