nasaharvest / crop-mask

End-to-end workflow for generating high resolution cropland maps
Apache License 2.0
95 stars 28 forks source link

Refactor area estimation #314

Closed bhyeh closed 11 months ago

bhyeh commented 1 year ago

Added:

  1. Refactored estimation code from area_utils.py and using Olofsson et al., 2014 "Good Practices...".
  2. Unittest for comparing area estimates using new refactored code against prior estimates computed outside of repo (using corrected Olofsson's excel tool) for Liaoning, China 2017 (crop area) and toy example in Olofsson et al., 2014 (change area).

Addressed/Fixed:

The following are fixed with the refactored code added:

  1. Fixed RuntimeWarning: overflow encountered in long_scalars that could happen previously in computing $V(P_i)$ with large pixel marginals by using NumPy and uint64 dtype.
  2. Fixed mistake from Oloffson excel tool in computing $V(P_i)$ where expr_2 was incorrectly calculated with the producer accuracy $P0$ for each class $c{i}$ regardless.
  3. Fixed mistake in compute_area_estimate from area_utils.py where expr_2 for $noncrop$ was also being calculated using $P{crop}$ instead of $P{noncrop}$.
  4. Fixed mistake from Oloffson excel tool in computing $V(P_i)$ where expr_1 for $c_{1}$ was incorrectly calculated using the marginal total of reference samples instead of mapped units.

Notes/Comments: Items addressed here in this PR that relates to #245.

  • [x] Adapt existing notebook to use util functions

The refactored estimation code should be easily adaptable into each notebook without needing any modification to the functions. But both notebooks were left untouched w/ this branch work.

  • [x] Add code for computing area estimates from confusion matrix (was not in notebook previously)

The refactored estimation code is pulled outside of notebook and works for both area and change area type confusion matrixes.

  • [x] Test by comparing change area estimates for Tigray computed using this notebook and prior analysis (outside repo) Liaoning, China 2017 and toy example in Olofsson et., al 2014.

Implemented unittests for area estimation comparison with prior work that was done outside of repo using the excel tool (after correcting for mistakes 2 and 4 in excel sheet).

  • [ ] Test by computing change area estimates for new ROI (e.g., Bure or Jimma)

Not addressed.

Remaining TODO Items:

bhyeh commented 1 year ago

3. Fixed mistake in compute_area_estimate from area_utils.py where expr_2 for noncrop was also being calculated using Pcrop instead of Pnoncrop.

https://github.com/nasaharvest/crop-mask/blob/f523fe21cf7709180810514626ade24740ee83d6/src/area_utils.py#L390-L397

adebowaledaniel commented 12 months ago

Thanks, @bhyeh, for catching and fixing the bug.

I just did a quick test with some data with me. LGTM.

Also, I am considering we include the test for other functions (load_ne,..., etc.) in the area_utils.py in the unittest_area, i.e. total removal of integration_test_area.py

bhyeh commented 12 months ago

Also, I am considering we include the test for other functions (load_ne,..., etc.) in the area_utils.py in the unittest_area, i.e. total removal of integration_test_area.py

I think it might be nice to keep separate the unittest for the specific estimation functions, and have the integration test that involves the entire end-to-end process of loading, binarizing, and etc.

bhyeh commented 12 months ago

running the unittests

  1. python -m unittest unittest_area_est.ChangeAreaTest
  2. python -m unittest unittest_area_est.CropAreaTest
review-notebook-app[bot] commented 12 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

bhyeh commented 12 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

@hannah-rae @adebowaledaniel I think this is nearly ready to be merged - just hoping for a quick lookover at the notebook changes. There, I've kept the cell outputs to give a preview of what the refactored code looks like and its outputs (the outputs are also fake data from the unit tests in case you were wondering, I'll clear them before merging).

adebowaledaniel commented 12 months ago

Thanks for the notebooks improvement (especially for the single year), @bhyeh 👍🏽. @hannah-rae give you the final comment on the change area notebook.

I will update the integration test later.

bhyeh commented 11 months ago

FYI acknowledgments/reasons for checks failing on PR:

  1. dataset-test - unrelated to anything done on this branch, but KenyaCEO2019 in datasets.py has less than 24 months of data.
  2. area-tests - use of seaborn module added to area_utils in a plotting function which is not in environment. this can be resolved when the area integration test gets updated by @adebowaledaniel and decision to add seaborn library to environment...or not.
  3. pre-commit - flake8 error because module level import is not at top of file, but the particular import has to come after altering module path.