nasaharvest / crop-mask

End-to-end workflow for generating high resolution cropland maps
Apache License 2.0
95 stars 28 forks source link

Addressing label disagreement workflow #275

Open ivanzvonkov opened 1 year ago

ivanzvonkov commented 1 year ago

Context: The first step of creating a crop land map involves generating a validation and test set. To do this two Collect Earth Online sets are created following this notebook The two sets contain identical data points and are labeled by different labelers. When labeling is complete, a single dataset is created by combining data points from both sets. The label: crop/non-crop is represented as a float: 1.0/0.0. i) If both labelers label the data point as crop (1.0), the final label is crop (1.0). ii) If both labelers label the data point as non-crop (0.0), the final label is non-crop (0.0). iii) If one labeler labels the data point as crop (1.0) and another as non-crop (0.0) the final label is 0.5. An example of this processing is shown here: https://github.com/nasaharvest/crop-mask/blob/0192198c4be59352ee8aa4293d65659273fcb5a4/datasets.py#L82

The 0.5 labeled data points represent disagreement points, where labelers disagreed and this is reported in data/report.txt. E.g. https://github.com/nasaharvest/crop-mask/blob/0192198c4be59352ee8aa4293d65659273fcb5a4/data/report.txt#L38

Because the disagreement points are neither crop or non-crop they are ignored by the model when loading data. See https://github.com/nasaharvest/crop-mask/blob/0192198c4be59352ee8aa4293d65659273fcb5a4/src/models/model.py#L203

Problem: Disagreement points represent difficult points to label and should not be ignored.

Potential Solution: There should be a workflow implemented to flag datasets with high disagreement and address those disagreements. This can potentially be done by returning to the original Collect Earth Online labeling projects and engaging experts to resolve disagreements. Documentation and automation of this workflow is important.

bhyeh commented 1 year ago

@ivanzvonkov - looking at patterns of disagreements may clue us how to more systematically resolve them, but what kind of patterns might be useful? Brainstorming some thoughts here....

  1. Certain labelers involved in disagreements a) Labelers who are now known to have been labeling incorrectly $\rightarrow$ Ignore their label in favor of the opposing label
  2. Analysis duration a) Short analysis times may be indicative of incorrectness, whereas longer duration times may be indicating more ambiguous points b) Above ^ can be further investigated with area estimation labeling projects (where a 'final' consensus label set is available) and see whether for disagreeing points if labels w/ the 'shorter' analysis durations were more often incorrect.
  3. Label distributions a) ?
hannah-rae commented 1 year ago

Re: 1a, this could also identify individuals who might need additional training/advice from an expert.

Re: 3, it could be interesting to know if there is more disagreement among crop points or non-crop points, but we don't know that unless we have a tie-breaker/correction. So that may be something to investigate with the crop area notebooks where we do have the tie broken by group/expert review.

ivanzvonkov commented 1 year ago

Re: 1a) this is useful. I envision the implementation to be a separate notebook that reads in all CEO files we have and does analysis over all sets at once to see legitimate trends

Re 2 I think this direction makes sense, and your initial work in the PR is a good first step.

hannah-rae commented 2 months ago

Flagging because Ben did lots of work on this and I need to review where that ended to see what we have left to do / whether this has been addressed and we should propagate our findings to other projects. @ivanzvonkov maybe we can revisit this before the new labeler students start.

Also this paper reminded me of this task because they address disagreement between labelers and how to resolve it a little bit: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021EA002085