nasaharvest / crop-mask

End-to-end workflow for generating high resolution cropland maps
Apache License 2.0
95 stars 28 forks source link

Correct bugs in previous version and speed up some functions #212

Closed hannah-rae closed 1 year ago

hannah-rae commented 1 year ago

I corrected a few mistakes that I apparently missed fixing/removing in the previous version that was approved in the PR. I also made a couple of changes to speed up the area counts and sampling.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

ivanzvonkov commented 1 year ago

uh oh merge conflicts

hannah-rae commented 1 year ago

The merge conflict is on the entire notebook. 🙄 I will merge master into this branch and resolve the conflict locally then commit again.

hannah-rae commented 1 year ago

After investigating further, this is a huge mess because major changes were made to this notebook by @adebowaledaniel (#209) that were merged into master. I guess I was too slow to give inputs before it was merged. The notebooks are so different that it is impossible to resolve the merge conflicts.

I think the easiest solution will be to make a copy of my version as a separate file and overwrite my version with @adebowaledaniel 's version to resolve the merge conflicts. Then we can discuss again if any changes need to be made to the new version that works on Colab.

hannah-rae commented 1 year ago

@adebowaledaniel What map did you test this notebook with?

I tried to test with a map we have already done estimates for with my old notebook for Heilongjiang (China) 2016. I want to compare the estimates to make sure they are the same from both versions of the notebook. The map is only about 300 MB but the Colab runtime died with a memory error. Can you or @ivanzvonkov test that you can load that map?

adebowaledaniel commented 1 year ago

@adebowaledaniel What map did you test this notebook with?

I tried to test with a map we have already done estimates for with my old notebook for Heilongjiang (China) 2016. I want to compare the estimates to make sure they are the same from both versions of the notebook. The map is only about 300 MB but the Colab runtime died with a memory error. Can you or @ivanzvonkov test that you can load that map?

I tested the colab notebook with Rwanda map which is around 1.7GB, and it works fine. I will test with Heilongjiang (China) 2016 and let you have the feedback.

adebowaledaniel commented 1 year ago

After investigating further, this is a huge mess because major changes were made to this notebook by @adebowaledaniel (#209) that were merged into master. I guess I was too slow to give inputs before it was merged. The notebooks are so different that it is impossible to resolve the merge conflicts.

I think the easiest solution will be to make a copy of my version as a separate file and overwrite my version with @adebowaledaniel 's version to resolve the merge conflicts. Then we can discuss again if any changes need to be made to the new version that works on Colab.

I saw this coming; I would have named my version crop-area-estimation-colab and maintained your version, as it's also available for local runs.

adebowaledaniel commented 1 year ago

@hannah-rae, what do you think of the update?

hannah-rae commented 1 year ago

I think it looks good! Two things for us to think about for people running this locally:

IMHO, I think we should assume that if someone is running it locally, they already have the repo and environment set up (and their github credentials). In that case, we would just move the credentials and cloning the repo to be part of the Colab setup. @adebowaledaniel do you agree?

hannah-rae commented 1 year ago

@adebowaledaniel @ivanzvonkov I think the google cloud authentication should be part of general setup (local or colab) because in either case we will need to download the relevant map from Google Cloud. However, this would mean we need to either

  1. install the google auth python package in the notebook (it is already installed in the colab environment) or
  2. add it to the list of packages in the environment yaml.

I'm thinking option 2 is preferable because most people using crop-mask probably won't be using the area estimation notebook. What do you guys think?

adebowaledaniel commented 1 year ago

In the commits above, I have separated the Colab and Local setup. To make things easy with google auth and access to cloud storage while using locally, it's better to install gsutil. This would resolve the local google auth issue immediately after the installation and setup. @hannah-rae @ivanzvonkov

adebowaledaniel commented 1 year ago

Updated the following:

ivanzvonkov commented 1 year ago

@adebowaledaniel I want to merge this PR in but looks like the integration test failure is legitimate. What do you think is the cause of this?

image
adebowaledaniel commented 1 year ago

@ivanzvonkov, the error is a result of this update which avoids the no-data value being treated as crop pixels: https://github.com/nasaharvest/crop-mask/blob/68653d1bec6efca190458d634b358757d6245d0f/src/area_utils.py#L132-L135

In the new commits, include the following: