adebowaledaniel commented 1 year ago

Start year: 2019 Start month: November

[x] Labeling project created
[x] Labeling completed
[x] Data added to repository
[x] Model trained (https://github.com/nasaharvest/crop-mask/pull/248/files)
[x] Map made
[x] Ideas for improvement @adebowaledaniel
[x] Expert review @hannah-rae @cnakalembe (map on GEE)
[x] Expert check of GLAD map - is 30m map ok for use case of this map? Accuracy is equivalent to ensemble.
[x] Export best map

adebowaledaniel commented 1 year ago

@cnakalembe, can you confirm the start month (February or September)?

ivanzvonkov commented 1 year ago

Merging script: https://github.com/nasaharvest/openmapflow/blob/main/openmapflow/scripts/merging.sh

adebowaledaniel commented 1 year ago

@ivanzvonkov, there are missing predictions in the Zambia cropland mask see here; despite running the inference multiple times which gave the same number of predictions made see the screenshot below.

ivanzvonkov commented 1 year ago

I think this is most likely due to the nan values in some tifs, you can go ahead and merge. The logs can be investigated to find out why this is happening

adebowaledaniel commented 1 year ago

I already merged; see the link I provided above (https://code.earthengine.google.com/a9522bd391a18cd98268994b6bffe317?hideCode=true)

The error messages vary; one is a request timeout, and the other is not specific.

ivanzvonkov commented 1 year ago

Some of the errors I am seeing look like this

This means there's a nan value in the tif. This was fixed in a newer version of OpenMapFlow (https://github.com/nasaharvest/openmapflow/pull/109)

It'll be deployed if you install it manually before deployment

pip install openmapflow==0.2.1rc2
export OPENMAPFLOW_MODELS="..."
openmapflow deploy

ivanzvonkov commented 1 year ago

New version can now be deployed if it's on master here by using the Github action manually: https://github.com/nasaharvest/crop-mask/actions/workflows/deploy.yml

adebowaledaniel commented 1 year ago

EE link

ivanzvonkov commented 1 year ago

Look into logs for what bands have Nans the most (logs)
Skip those in the training of the next model

ivanzvonkov commented 1 year ago

Blocked by #230

ivanzvonkov commented 1 year ago

Above is merged can retrain with missing values in training data

adebowaledaniel commented 1 year ago

Here is the error I got while trying to retrain the model @ivanzvonkov.

adebowaledaniel commented 1 year ago

Nit: I think this warning might be of concern later.

ivanzvonkov commented 1 year ago

@adebowaledaniel re: arrow error it happens because of this line https://github.com/nasaharvest/crop-mask/blob/7f5f809149d49aea9458e13d5ee88d3ad3b3484b/src/models/data.py#L95

Do you see the bug? 🐛

ivanzvonkov commented 1 year ago

248 will be merged soon

ivanzvonkov commented 1 year ago

Consider creating small map and checking visually

adebowaledaniel commented 1 year ago

Still the same problem @ivanzvonkov

hannah-rae commented 1 year ago

Map quality is poor due to large and small scale blockiness, blatantly wrong predictions

@adebowaledaniel to investigate a few things to debug:

check the GEE logs to see if there are any errors being thrown to explain why there is missing data
run error analysis notebook to plot errors in evaluation set to see if there is geographical pattern to errors (may need to update notebook)

adebowaledaniel commented 1 year ago

The three error codes in the logs:

500: The request was aborted because there was no available instance. As a result of (i) a sudden increase in traffic or (ii) long request processing time.
503: The request failed because either the HTTP response was malformed or connection to the instance had an error. The container instance was found to be using too much memory and was terminated. This is likely to cause a new container instance to be used for the next request to this revision.
504: The request has been terminated because it has reached the maximum request timeout.

Potential solution: Increasing the memory limit and reducing the request per container.

adebowaledaniel commented 1 year ago

EE link: https://code.earthengine.google.com/5b8618b5e74bf921ddb21d355b217c08

ivanzvonkov commented 1 year ago

error analysis notebook updates
ERA5 grid overlay on map
If possible export grid overlay on map

ivanzvonkov commented 1 year ago

Ivan to send Zambia data if he has it Adebowale to create PR for error analysis notebook update

adebowaledaniel commented 1 year ago

From the error analysis, lots of the error is due to shrubs, points with lots of vegetation and fallow field predicted as cropland. Also, a point on a rooftop is predicted as a cropland pixel; checking the prediction map, I noticed this pattern which predominantly affects almost the entire city of Lusaka (an urban settlement) to be predicted as cropland.

ivanzvonkov commented 1 year ago

NDVI vs other indices exploration, potentially training on higher quality data

adebowaledaniel commented 1 year ago

Training on higher-quality data; there is no improvement in the new model @hannah-rae https://github.com/nasaharvest/crop-mask/blob/853344e17a9e16054a9531840e9752b9b4a1ca00/data/models.json#L310-L312 previous model: https://github.com/nasaharvest/crop-mask/blob/9171bd68b8d86c7acfb2c245b864539b419639fd/data/models.json#L242-L244

hannah-rae commented 1 year ago

Next step for week of April 17 @adebowaledaniel : evaluate quality of CEO Zambia data ([using rubric])(https://docs.google.com/spreadsheets/d/1BYOrMkmryjngGApKIYZ0YXzJkFzyc2DM3hydYVT_vh0/edit#gid=0)

hannah-rae commented 12 months ago

Next step for week of May 22 @adebowaledaniel : apply post-classification NDVI filtering (using method designed by @bhyeh ) and evaluate error rates and types

hannah-rae commented 11 months ago

During the operational meeting @bhyeh mentioned that @adebowaledaniel noted that since the Zambia_CEO_2019 dataset had a 0/0.5/0.5 train/val/test split the model may not have been trained with any points from Zambia. We usually set the CEO datasets with this ratio when we assume/know there are local samples in the other datasets (e.g. a ground-based dataset independent from the CEO dataset), but in the case of Zambia it's very possible there were little to no points in the other datasets. (One could check in GeoWiki how many are in Zambia, but this is probably the only dataset that has Zambia points). @adebowaledaniel can you post your updated results/plan based on your 0.6/0.2/0.2 split here?

adebowaledaniel commented 11 months ago

Thank you, @hannah-rae. As you mentioned, the Geowiki is the only dataset with Zambia data with a training subset with 336 sample points (positive class: 5.6%). Here is the result for the split to 0.6/0.2/0.2. https://github.com/nasaharvest/crop-mask/blob/1785602602d1260edb53a13a608e9ee84c5d6f8d/data/models.json#L325-L341

I applied the post-classification NDVI filtering method by Ben on a subset produced by the model; the output is here.

adebowaledaniel commented 11 months ago

June 5 - Check for cloud presence in the tif files

adebowaledaniel commented 11 months ago

@hannah-rae, Contrary to our expectations of cloud presence, the Sentinel-1 bands were absent in those oddly-shaped regions on the map. I shared my observations in this slide and also included a notebook (link in the slide) in case you want to reproduce what I did.

hannah-rae commented 11 months ago

Very interesting... was that not captured in the logs at all? Maybe we should add a test when the data are exported to check that none of the bands are missing data.

For now, perhaps it makes sense to train a new model without S1?

adebowaledaniel commented 11 months ago

Here are the outputs of the new model trained without S1: map(as expected, it's without the weird features) and metrics. Let me know your observation of the subset map generated. Should we continue with this model for the entire country?

I will create an issue regarding the missing S1 bands; also, check the eo export log for any clues.

adebowaledaniel commented 10 months ago

@hannah-rae Crop Mask + Postclassification processing: here

hannah-rae commented 10 months ago

@adebowaledaniel can you make the assets public?

adebowaledaniel commented 10 months ago

Done @hannah-rae

hannah-rae commented 10 months ago

Loading is crashing for me. @cnakalembe to try loading and will do expert sign-off

cnakalembe commented 8 months ago

I reviewed the map; I think the next step is manual cleanup removing obvious features like roads, I've seen some mines too. We could develop some clear guidance for this and I think Diana can do it in QGIS/ArcGIS

hannah-rae commented 8 months ago

@hannah-rae will make GEE script in repo to export ensemble map for Zambia (and other future countries)

update: should be addressed by notebook/GEE app created by @ivanzvonkov in #315

hannah-rae commented 8 months ago

@ivanzvonkov will make this map and update intercomparison re #346

ivanzvonkov commented 8 months ago

After running intercomparison on Zambia with full evaluation set (validation and test), ensemble ties the glad map.

There are also not that many points to begin with because many of them were sampled outside of Zambia boundaries (old CEO project). Should we proceed with just exporting GLAD map? @hannah-rae

hannah-rae commented 8 months ago

Next step for @cnakalembe to check if the GLAD map looks ok and is ok for use case, or if there is some reason to export the ensemble map instead.

cnakalembe commented 8 months ago

GLAD map is okay for the use case!

hannah-rae commented 7 months ago

Next step: @ivanzvonkov run the export code for GLAD map

ivanzvonkov commented 7 months ago

Shared exported map on slack

nasaharvest / crop-mask

Cropland: Zambia 2019 #221

248 will be merged soon