nasaharvest / crop-mask

End-to-end workflow for generating high resolution cropland maps
Apache License 2.0
94 stars 26 forks source link

Notebook to format Senegal and Tigray sets for dataperf #365

Closed hannah-rae closed 5 months ago

hannah-rae commented 5 months ago

This notebook prepares the Senegal and Tigray datasets for the DataPerf challenge. Namely we:

Questions to discuss @gabrieltseng:

review-notebook-app[bot] commented 5 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

ivanzvonkov commented 5 months ago

I noticed a lot of points with nan data instead of the eo_data array (158 for Senegal, 168 for Tigray 2021, 173 for Tigray 2020). Is this typical?

When there is disagreement between the two CEO sets the eo_data is never fetched. I checked Senegal and Tigray 2020 and it's all just those disagreement points:

image
gabrieltseng commented 5 months ago

CropHarvest subsets from February to February (specifically from 6 February to 1 February).

I think it makes sense to be consistent with CropHarvest in the subsetting here?

Otherwise looks good to me !

In terms of the specific questions:

  1. We probably want location data too? Since this might affect how we define the task bounding box.
  2. Agreed about the balancing - although with sufficient positives this might be less of an issue? What has your experience been @ivanzvonkov ?
ivanzvonkov commented 5 months ago

Agreed about the balancing - although with sufficient positives this might be less of an issue? What has your experience been @ivanzvonkov ?

I think balancing makes sense. Depends how the set is intended to be used. Adding crop points by sampling from existing maps could also be an option.

hannah-rae commented 5 months ago

Thanks for your feedback @gabrieltseng and @ivanzvonkov. In the new commit:

ivanzvonkov commented 5 months ago

FYI Senegal has now been updated with points from CSE which eliminates disagreement: https://github.com/nasaharvest/crop-mask/pull/369/files

Also here's a faster way to format the datasets for future use:

image

Otherwise looks good