openkfw / mapme.protectedareas

Reproducible workflows in R for processing open geodata to create knowledge about KfW supported protected areas and conservation effectiveness.
GNU General Public License v3.0
3 stars 0 forks source link

Check database integrity #84

Closed yotaae closed 2 years ago

yotaae commented 2 years ago

Hi @Jo-Schie @melvinhlwong,

we have two problems related to treatment assignment in the matching frames:

For illustration of the first two issues, I have attached a map of the cells included in matching frame 2006: (1) Blue cells (treatment) outside of PAs that were supported in 2006 (yellow Polygons). (2) No blue cells in some of the supported PAs. All yellow polygons (at least the larger ones) should include at least some treatment cells, correct?

Matching_frame_2006_map

Third problem:

year2005_issues

melvinhlwong commented 2 years ago

@Jo-Schie

yotaae commented 2 years ago

Problems related to treatment assignment and missing cells in treated PAs seems to be solved, at least in the year 2006 - see screenshot below. image

yotaae commented 2 years ago

@Jo-Schie @melvinhlwong

I've just checked the treatment cells in some other years - looks fine in most years.

One exception is the year 2012 (see screenshot below): Some treated PAs do not have treatment cells in them. Is this related to the conservation category being "UNESCO-MAB Biosphere Reserve"? All of the respective PAs belong to this category. I remember an issue by @Jo-Schie saying something about this conservation category but can't find the issue.

image

yotaae commented 2 years ago

NA problem to remember/ check after adaption of treatment assignment in matching frames: NA values for covariates in matching frames. Related to filtering PAs (e.g. marine areas without tree cover etc.) -> Filtering may solve this problem. See screenshot of NA values below:

image

Jo-Schie commented 2 years ago

@Jo-Schie

  • [x] Describe how treatment variable is created in matching frame (write down here)
  • [x] Describe how matching frames ensure that treatment cell in one year can not be control cell in another year

@melvinhlwong and @yotaae :

Please see issue #88 for maps with results.

Jo-Schie commented 2 years ago

it would be great to have a map or more information why we face this missing values.

yotaae commented 2 years ago

I have added a map to the data check file. More specifically, treated cells with missing values for treecover, country, terrain ruggedness and travel time are plotted. The patterns of missing values for treecover loss and emissions seem to be identical to the pattern of missing values of the variable treecover. Treecover missings seem to follow a strange pattern (see screenshot below), the missings of the other three variables seem to be related to water surfaces and country borders.

The cells are a little difficult to see (at least in the case of the latter three variables) but I wanted to give an overview of the affected areas. More details can be seen in the interactive map.

Map for treecover: image

Map for country missings: image

Map for TRI missings: image

Map for image

fyi @Jo-Schie @melvinhlwong

yotaae commented 2 years ago

New plot with missing percentages:

Treatment group: newplot

Control group: newplot (1)

Here i do Not See missing savor treecover. How does that play wit the map that shows a lot of missing @yotaae ?

Jo-Schie commented 2 years ago

Treecover is definitely processing related. Darius encountered a similar issues. @goergen95 could you please have a quick look or comment? Maybe you can provide us a quick fix...

yotaae commented 2 years ago

@Jo-Schie, you can see missings for for treecover in the treatment group, e.g. the spike in 2009. Just a little difficult to see because the pattern for missings are the same for treecover, loss, loss_t3 and emissions -> colors of the lines are are mixed. Does that answer the question?

Jo-Schie commented 2 years ago

I can confirm that the missing values issue in the forest cover dataset is related to the input data.

yotaae commented 2 years ago

Update on missing data for treecover: We have much less missing values now: ~150k before, 5k now. See screenshot with remaining cells with missing data for treecover below:

image

Jo-Schie commented 2 years ago

solved via email.