Closed hannah-rae closed 5 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
I noticed a lot of points with nan data instead of the eo_data array (158 for Senegal, 168 for Tigray 2021, 173 for Tigray 2020). Is this typical?
When there is disagreement between the two CEO sets the eo_data
is never fetched. I checked Senegal and Tigray 2020 and it's all just those disagreement points:
CropHarvest subsets from February to February (specifically from 6 February to 1 February).
I think it makes sense to be consistent with CropHarvest in the subsetting here?
Otherwise looks good to me !
In terms of the specific questions:
Agreed about the balancing - although with sufficient positives this might be less of an issue? What has your experience been @ivanzvonkov ?
I think balancing makes sense. Depends how the set is intended to be used. Adding crop points by sampling from existing maps could also be an option.
Thanks for your feedback @gabrieltseng and @ivanzvonkov. In the new commit:
FYI Senegal has now been updated with points from CSE which eliminates disagreement: https://github.com/nasaharvest/crop-mask/pull/369/files
Also here's a faster way to format the datasets for future use:
Otherwise looks good
This notebook prepares the Senegal and Tigray datasets for the DataPerf challenge. Namely we:
Questions to discuss @gabrieltseng: