Cropland: Mali-National Level 2019

cnakalembe commented 1 year ago

Month: Feb Year: 2019

[x] Labeling project created
[x] Set 1 Labeling: Ben, Diana, Aditya, Mirali
[x] Set 2 Labeling: Abena, Isha, Bhanu, Taryn
[x] Data added to the repository
[x] LULC stratified labeling created
[ ] Set 1
[ ] Set 2
[ ] Model trained
[ ] Map made

ivanzvonkov commented 1 year ago

@cnakalembe what year?

cnakalembe commented 1 year ago

Preference is 2022 since the season is done

On Nov 12, 2022, at 10:45 AM, Ivan Zvonkov @.***> wrote:

@cnakalembe what year?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

ivanzvonkov commented 1 year ago

Context: When we randomly sample points for evaluation on CEO, 1% or less of the sample may be crop

Issue: Low crop sample size make evaluation difficult

Potential solution:

[ ] Consider stratification to have a representative sample that has more crops
[ ] List possible stratification strategies (E.g. by land cover map, by NDVI, by acroecozone, research other methods in the literature)
[ ] Visualize stratification strategies
[ ] Pick stratification strategy by democratic vote
[ ] Consider evaluation adjustments to match sample inclusion

MsPixels commented 1 year ago

From the literature review on stratification techniques, I summarized some of the methods used in the papers listed in the Google Doc -- https://docs.google.com/document/d/1QfBemFjJtRUJ3C8Z70tszGlNZo5iB50knRK0y6iusqk/edit

Happy to hear your comments!

ivanzvonkov commented 1 year ago

Read through the report in some more detail, great summaries!

MsPixels commented 1 year ago

Context: NDVI by stratification is the easiest and most intuitive way of sampling random points

Issue: Low crop sample size makes evaluation difficult

Potential solution:

[x] Research into NDVI by quartiles
[x] Decide on NDVI metrics (median, mean, time period)
[x] Write a code in GEE or Colab that performs stratification by NDVI using MODIS
[ ] Compare results to the random sampling results generated by CEO

ivanzvonkov commented 1 year ago

Changing year to 2019

MsPixels commented 1 year ago

@ivanzvonkov, @hannah-rae, I tried the NDVI by quartiles. What do you think?

Also adding the GEE code - https://code.earthengine.google.com/0d6758d51ef68a4bbdb881d11edd1eb3

ivanzvonkov commented 1 year ago

@MsPixels okay took a look at this in some more detail couple questions:

How come the percentiles are [10,25,50,75,90] in ee.Reducer.percentiles, shouldn't it be [25, 50, 75]?
How come the fourth region is percentiles.get("NDVI_p90")) shouldn't it be p100 or something like that?

After clarification of these questions the next step is to figure out how many points to sample. The Olofsson paper: https://www.sciencedirect.com/science/article/abs/pii/S0034425714000704 is one resource for this. Once you have that number we can figure out how to create a Collect Earth Online labeling set from these points.

MsPixels commented 1 year ago

@ivanzvonkov, I was curious about the difference between the 10th and 25th quartiles, that's why I calculated it. However, the 10th quartile doesn't reflect in the strata map. Also, the 25th, 50th, and 75th percentiles only exclude the woody vegetation zone of Mali.
Again, changing the 90th percentile to the 100th generalizes the map.

ivanzvonkov commented 1 year ago

Okay I see. So given:

NDVI_p25: 0.09578940770143446
NDVI_p50: 0.1190916155868571
NDVI_p75: 0.19721850558714377
NDVI_p90: 0.2832040283154737

As I understand your suggestion is:

strata 1: NDVI p0 - p50
strata 2: NDVI p50 - p75
strata 3: NDVI p75 - p90
strata 4: NDVI p90 - p100

But these are not true quartiles, right? Wouldn't quartiles be:

strata 1: NDVI p0 - p25
strata 2: NDVI p25 - p50
strata 3: NDVI p50 - p75
strata 4: NDVI p75 - p100

This can be plotted with:

var NDVI_threshold = ee.Image(1)
      .where(clip.gte(ee.Number(percentiles.get("NDVI_p25"))), 2)
      .where(clip.gte(ee.Number(percentiles.get("NDVI_p50"))), 3)
      .where(clip.gte(ee.Number(percentiles.get("NDVI_p75"))), 4)
      ;

Would this make more sense as a stratification? What are the pros and cons of your suggested ranges?

MsPixels commented 1 year ago

I chose the 25th, 50th, 75th, and 90th percentiles based on this article I found Long-Term_Land_UseLand_Cover_Change_Assessment_of_the_Kilombero_Catchment_in_Tanzania. I will go ahead and use the quartiles you suggested.

MsPixels commented 1 year ago

Also, after going through Olofsson's paper, I came up with this sampling design for three scenarios based on the standard error of the overall accuracy. Will be on standby for your comments. Sampling Design for Mali

MsPixels commented 10 months ago

Stratification by LULC. This code combines 11 LULC layers to get a majority vote of crop and noncrop zones. Based on the strata, I sampled the crop and noncrop points.

nasaharvest / crop-mask

Cropland: Mali-National Level 2019 #239