Understanding map quality with agroecological zone context

ivanzvonkov commented 1 year ago

Context

To create a cropland map: 1) A model is trained on a labeled training dataset and evaluated on a labeled evaluation set. 2) Then the trained model makes predictions across all the data for an area of interest (AOI).

Is the cropland map good?

A model trained on training dataset A is presumed to perform well on evaluation dataset B if A and B have a similar distribution.
A trained model that performs well on evaluation dataset B is presumed to make a good map for area of interest C if B and C have a similar distribution.

Issue 1: Understanding performance on evaluation dataset:

Currently we evaluate a trained model by measuring the f1-score over an evaluation dataset B. The metric helps us understand how well the model predicts crops, however it does not tell us a lot about what sort of errors the model may be making.

Issue 2: Performance of evaluation dataset translating to map quality

The score on the evaluation dataset B only matters if the distribution of B is similar to the area of interest C. Currently we:

Create evaluation dataset B by randomly sampling inside the area of interest C
Record the crop percentage in the sampled dataset B
Record the labeler disagreement in the sampled dataset B

These points help shed some light onto the similarity between B and C and thereby translation of metric to map quality. However is it possible to have more confidence about a good metric translating to high map quality?

Potential Solution:

We can use agro-ecological zones to 1) better understand performance on the evaluation dataset, and 2) better understand performance translating to map quality by measuring model performance on each agro-ecological zone represented inside the evaluation dataset.

From FAO:

An Agro-ecological Zone is a land resource mapping unit, defined in terms of climate, landform and soils, and/or land cover, and having a specific range of potentials and constraints for land use.

Understanding performance on each zone will be especially relevant for areas of interest with many agro-ecological zones such as Uganda (#254)

This additional understanding will help inform how we gather future evaluation data, corrective labeling, and disclaimers that we can add to published cropland maps.

Potential implementation

1. Record the agro-ecological zone of each evaluation point when available. This can be implemented by adding a dataset of agro-ecological zones for a particular region and using that dataset to determine the acroecological zone for every coordinate in a LabeledDatasetand generate an additional agro-ecological column: https://github.com/nasaharvest/crop-mask/blob/0cf29ff00eeecfa3385eab826fb9d2ca7654c822/datasets.py#L65 2. Using the newly generated agro-ecological column to record the agro-ecological distribution for each dataset in data/reports.txt This can be implemented by adding an additional line to compute the value_counts() for the agro-ecological column here: https://github.com/nasaharvest/crop-mask/blob/0cf29ff00eeecfa3385eab826fb9d2ca7654c822/src/labeled_dataset_custom.py#L105 3. Log a new per class agro-ecological accuracy to wandb in a confusion matrix to better understand how well each model is doing in each zone This requires a little more nuance because pytorch-lightning takes responsibility for some of the metric recording, however the relevant lines of code will be here: https://github.com/nasaharvest/crop-mask/blob/0cf29ff00eeecfa3385eab826fb9d2ca7654c822/src/pipeline_funcs.py#L100

ivanzvonkov commented 1 year ago

@cnakalembe based on our discussion, let me know if you have any suggestions / modifications.

hannah-rae commented 1 year ago

Sounds like a good study! Not sure if you already discussed who would work on this, but it could be a good-first-issue.

I think it would be good to circle back on the work that has been done on generating dataset reports and how we can make these easily accessible/usable (including the intercomparison reports).

ivanzvonkov commented 1 year ago

We have not discussed who would work on this yet.

I think it would be good to circle back on the work that has been done on generating dataset reports and how we can make these easily accessible/usable (including the intercomparison reports).

Agreed, I think it would be helpful to narrow down the target audience for this. In my view, the purpose of this issue is to help us (cropland producers) make better decisions around gathering future evaluation data, corrective labeling, and disclaimers that we can add to published cropland maps. Thereby the final deliverable of the potential solution is a wandb metric that'll be associated with a model and accessible for us.

Who would you say is the target audience for dataset reports? @hannah-rae

nasaharvest / crop-mask