Open ivanzvonkov opened 1 year ago
@ivanzvonkov - looking at patterns of disagreements may clue us how to more systematically resolve them, but what kind of patterns might be useful? Brainstorming some thoughts here....
Re: 1a, this could also identify individuals who might need additional training/advice from an expert.
Re: 3, it could be interesting to know if there is more disagreement among crop points or non-crop points, but we don't know that unless we have a tie-breaker/correction. So that may be something to investigate with the crop area notebooks where we do have the tie broken by group/expert review.
Re: 1a) this is useful. I envision the implementation to be a separate notebook that reads in all CEO files we have and does analysis over all sets at once to see legitimate trends
Re 2 I think this direction makes sense, and your initial work in the PR is a good first step.
Flagging because Ben did lots of work on this and I need to review where that ended to see what we have left to do / whether this has been addressed and we should propagate our findings to other projects. @ivanzvonkov maybe we can revisit this before the new labeler students start.
Also this paper reminded me of this task because they address disagreement between labelers and how to resolve it a little bit: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021EA002085
Context: The first step of creating a crop land map involves generating a validation and test set. To do this two Collect Earth Online sets are created following this notebook The two sets contain identical data points and are labeled by different labelers. When labeling is complete, a single dataset is created by combining data points from both sets. The label: crop/non-crop is represented as a float: 1.0/0.0. i) If both labelers label the data point as crop (1.0), the final label is crop (1.0). ii) If both labelers label the data point as non-crop (0.0), the final label is non-crop (0.0). iii) If one labeler labels the data point as crop (1.0) and another as non-crop (0.0) the final label is 0.5. An example of this processing is shown here: https://github.com/nasaharvest/crop-mask/blob/0192198c4be59352ee8aa4293d65659273fcb5a4/datasets.py#L82
The 0.5 labeled data points represent disagreement points, where labelers disagreed and this is reported in
data/report.txt
. E.g. https://github.com/nasaharvest/crop-mask/blob/0192198c4be59352ee8aa4293d65659273fcb5a4/data/report.txt#L38Because the disagreement points are neither crop or non-crop they are ignored by the model when loading data. See https://github.com/nasaharvest/crop-mask/blob/0192198c4be59352ee8aa4293d65659273fcb5a4/src/models/model.py#L203
Problem: Disagreement points represent difficult points to label and should not be ignored.
Potential Solution: There should be a workflow implemented to flag datasets with high disagreement and address those disagreements. This can potentially be done by returning to the original Collect Earth Online labeling projects and engaging experts to resolve disagreements. Documentation and automation of this workflow is important.