nasaharvest / crop-mask

End-to-end workflow for generating high resolution cropland maps
Apache License 2.0
100 stars 29 forks source link

Area Estimation for Multiple RoIs and Time Periods #411

Open Gedeon-m-gedus opened 3 weeks ago

Gedeon-m-gedus commented 3 weeks ago

Issue Description

The current area estimation notebook functions well for individual cases, but when dealing with multiple regions of interest (RoIs) and different time periods, it becomes prone to errors. Each time you want to estimate the area for another RoI or period, you need to rerun the entire notebook. Additionally, the results are only printed, requiring manual copying to store them, which increases the chance of errors and slows down the estimation process.

Suggestion

Develop a separate reusable script or notebook that uses a configuration file to handle multiple RoIs and time periods. This setup will streamline the estimation process by allowing batch processing of RoIs, automating the recording of results, and improving overall efficiency and accuracy.

TODOs

Gedeon-m-gedus commented 3 weeks ago

@adebowaledaniel requesting your review for this issue.

Gedeon-m-gedus commented 3 weeks ago

@hannah-rae and @ivanzvonkov requesting your review.

ivanzvonkov commented 3 weeks ago

Well you've asked for a review, and a different view is what you'll get! Of course it's totally my view and if you disagree/decide not to follow it, I don't mind at all!

Anyway, the issues you mention resemble the issues I had when I started running intercomparison on new countries/regions. It was also my initial reaction to continue making the intercomparison code more generalizable (more config, more reusable notebook).

However, in the end what made my life easier was a template notebook. I would duplicate the template notebook each time I needed an intercomparison and make minor adjustments to address issues that came up. I think a template area estimation notebook that you can duplicate for each ROI and year will address the issues you bring up. It'll also double as a record of the results.

To throw a haymaker in here, if one of the biggest time sucks is actually dealing with downloading and organizing the GeoTiffs I would consider exploring area estimation in Google Earth Engine. It might be easier to bring the code to the data, rather than bring the data to the code each time. I had started exploring this and have some code but it's not fully there.

hannah-rae commented 3 weeks ago

@ivanzvonkov I agree that in the case of the intercomparison, a template notebook was useful because we usually want to run it for one country/year at a time for a given project and there are very few details that need to be set for each one. Also, the main result that we want to get from that is the code snippet and the plot of all scores at the end.

For annual area estimation, we actually have been using a template notebook already. The issue is that even with the template notebook, there are a lot of things that need to be set for each run and copying the various results and logging the steps along the way is a pain and prone to errors. I personally think that a configurable/modular approach that Gedeon is suggesting would be super valuable.

I think we could also look at GEE in the future, but for now the python version is still what we use most and having the configurable version of that would be helpful, even if we decide to move to GEE in the future.