openkfw / mapme.protectedareas

Reproducible workflows in R for processing open geodata to create knowledge about KfW supported protected areas and conservation effectiveness.
GNU General Public License v3.0
3 stars 0 forks source link

Create Variables for heterogenous treatment effects #97

Closed Jo-Schie closed 2 years ago

Jo-Schie commented 2 years ago

WDPA based

-> @yotaae : Please find out how many missings we have in different categories from WDPA

Internal documentation based

-> @yotaae Please prepare the data for import into the database

yotaae commented 2 years ago

I have created a markdown file to check the different categories in the WDPA database (see first point in comment above).

As expected, some of the variables (e.g. OWN_TYPE) have "not reported" or "not applicable" in many cases, which is why they are not really useful for an analysis of heterogeneous treatment effects.

The variables GOV_TYPE, IUCN_CAT (here, we have some "not reported", too), ecosystem_kfw or DESIG_TYPE (not sure what this refers to) look more promising.

Other variables like type_kfw or DESIG_ENG have (in my opinion) too many categories.

See file here: /datadrive/yota/development/WDPA_heterogeneity_vars.html

Jo-Schie commented 2 years ago

Cool Thinge @yotaae . Just a small question / suggestion. Why not render the file into the /docs folder and push it to main? Like this we could have a look at it online...

IUCN_CAT would be indeed my preferred option as is since it shows usage categories. How many missings do we have here? We could probably also just interpret the NAs as Not clear I.e. probably not yet defined or miscellaneous. I think that would be okay because I'd guess that there is no other good reason for being not defined. IUCN category is one of the main things that are looked upon submissions to the WDPA so I guess that there will be no missings for being simply forgotten to fill out or similar.

The other variable of interest would be the internal. Did you progress on that?

Jo-Schie commented 2 years ago

as for internal documentation: I suggested this methodology internally: https://conbio.onlinelibrary.wiley.com/doi/epdf/10.1111/j.1523-1739.2008.00937.x

Please have a look at it and let's discuss.

yotaae commented 2 years ago

@Jo-Schie Good point, I'll add the files to main. Had some issues with pushing the files this morning but I'll update you here as soon as I have worked it out.

Regarding IUCN categories: we do not have missing values for IUCN but we have "Not reported" (n=28), "not applicable" (n=3) and "not assigned" (n=3) in some cases. Total number of PAs should be 416 if I haven't made any mistakes. Can we work with the above categories?

Regarding internal data: I have merged our assetids with bmz numbers and our project data. I still have to check the data though, since some assetids seem to be assigned to multiple WDPA IDs. Can't say if that's a problem for now - will update here.

Jo-Schie commented 2 years ago

For me that seems to be a good variable. We should use it and quantify results. I would use "not reported" and similar as categories rather then missings @yotaae and @melvinhlwong

Jo-Schie commented 2 years ago

fyi @melvinhlwong After talking to Yota on the phone we agreed on the following:

The latter two require new routines to match the data already in the sampling frame creation because of overlapping wdpa areas that lead to problems when merging (no 1:1 solution).

melvinhlwong commented 2 years ago

thanks for the info