Currently, our intercomparison and other metrics are being computed based on the confusion matrix expressed in terms of sample size. However, the recommendation in Stehman & Foody 2019 is to compute metrics from the confusion matrix expressed in terms of map area proportions of each class (which is what we do for area estimation in src/area_utils.py see lines 563-567 in that file).
This PR adds a new function to compute the mapped area for each class and use these totals to compute the "area matrix". There are a few issues to be addressed still:
[x] I count the number of pixels in each class by summing the binary values (e.g., if cropland = 1 and noncrop = 0, then the sum of the map is equal to the number of cropland pixels). However, the sum returned by the ee Reducer is a float, while I would expect an integer. [SOLVED] This is because the reduction is weighted by default, so that partial pixels resulting from the clip() operation, e.g., are partial. I changed this to use an unweighted reduction because this is how we do it in our area estimation pipeline too and integers seem more transparent to me. The difference is negligible between the two anyway.
[x] Getting the number of pixels for the ensemble in a way that lets us be flexible about which map(s) we specify to use for the ensemble is tricky. I am still thinking about the best way to do this. [SOLVED]
[x] The F1 scores for the Rwanda test case do not look right. I am still investigating this. [SOLVED] This was because the F1 score (and some of the standard errors) were computed using the sample metrics from the sklearn classification report, but others were computed using the population error matrix.
Currently, our intercomparison and other metrics are being computed based on the confusion matrix expressed in terms of sample size. However, the recommendation in Stehman & Foody 2019 is to compute metrics from the confusion matrix expressed in terms of map area proportions of each class (which is what we do for area estimation in
src/area_utils.py
see lines 563-567 in that file).This PR adds a new function to compute the mapped area for each class and use these totals to compute the "area matrix". There are a few issues to be addressed still:
maps