evaluate with superpops: how is the average calculated?

richelbilderbeek commented 2 years ago

Dear GenoCAE maintainers, hi @cnettel and @kausmees,

As you are back, I have found the following (here discussed from my point of view). Here I submit something I found unexpected. If you also did not expect this, I'd happily create a minimally reproducible example.

When using evaluate with a superpops file, in one of my cases I got the following:

Population	num samples	f1_score_3	f1_score_5
C	333	0.0000	0.0000
B	334	0.2431	0.0000
A	333	0.4400	0.4996
avg (micro)	1000	0.3100	0.3330

The unexpectedness is in the last line, that suggests to calculate the average, but appears to do different things per column (and I understand for the first column (num_samples) to use a sum there :-) ).

I would expect the averages to be:

Population	num samples	f1_score_3	f1_score_5
C	333	0.0000	0.0000
B	334	0.2431	0.0000
A	333	0.4400	0.4996
avg (micro)	333	0.2277	0.1665

I checked: these 'averages' are also neither the harmonic nor geometric mean.

What are those values?

If you think these are weird as well, I will happily create a reproducible example. Else, I am happy to learn what these values are :-)

kausmees commented 2 years ago

Hello

The average reported there is the micro-averaged F1 score, calculated by this function: here.

It is calculated globally over the classes based on the total true positives, false negatives and false positives, and can't be derived from the numbers in this table alone. This is why we chose to print it there, whereas the macro-average and weighted average can be calculated from the per-class F1 scores in this table.

I'm not sure of the utility of this measurement for this particular application, in our paper we reported the weighted average F1 score over the classes.

I see how this is confusing though, I will write an explanation in the README to document the behaviour.

Thanks for bringing it up, K

richelbilderbeek commented 2 years ago

Thanks for clearing that up :+1:

kausmees / GenoCAE

evaluate with superpops: how is the average calculated? #31