Add description about internal processing for area estimations to the package and possible overestimations

Jo-Schie commented 2 years ago

I just noticed that it would be very desirable to document somewhere how the "internals" of the area calculations of the package work maybe also adding a little graph.

Background: : @Ohm-Np explained me that the area calculation in the package is done with crop -> mask -> cellsize -> zonal . The exactness of this approach depends on the the Area size of the AOI and the resolution of the input raster. This is, because the raster that is used for zonal will intersect the AOI and also eventually cover areas that are outside of the boarders of the AOI. Those areas are included and therefore there will be always an overestimation if areas are being calculated (at least if you calculate the sum of areas).

An extreme edge case could be that you have a very small AOI (say 1 hectar) and a very low resolution input raster (say 500x500 meters). The input raster would be cropped, masked and cellsizes would be calculated. You might then eventually end up with e.g. 4 cells that intersect the AOI and have a total area of 1000 x 1000 meters whereas AOI is only 100 x 100 meters.

I don't know if this is relevant at the current stage because area sizes are AFAIK only calculated for forest area, magrove area and land-cover area and all of them have fairly high resolutions between 30 and 100 meters... and for our use-case of using protected areas that are fairly large, the estimations will not deviate a lot... Nevertheless, it could be good to show that to users in order to make them understand, why some of the calculations might give results that are larger then the original AOI (even if the differences are small). Maybe someone uses this package for small AOIs and might have trouble understanding the results.

Small illustration:

Not sure, where this would be a good fit for the documentation and what you think about that issue @goergen95 .

goergen95 commented 2 years ago

This is definitely something we need to document somewhere. I suggest somewhere where we talk about different engines (so this is slightly related to #69). Depending on the engine and the structure of the data (as indicated by your sketches) you can get different results. In my current understanding, terra::extract and exactextractr::exact_extract can be configured to only take into account the proportion a raster cell is covered by a polygon. terra::zonal thus will result in relatively crude estimates. I think we should address this properly when implementing the engines more thoroughly and include a sensitivity analysis of some kind showing users the differences in the estimates.

Jo-Schie commented 2 years ago

Agreed @goergen95 . I would, nevertheless suggest in the meantime that we just open a new chapter in the documentation and call it "Technical details" or something. Later we can rename it to "technical details and engine choice" or similar. Is that okay with you? I can opena branch and make a suggestion. I am not sure if the other engines really differ because they would need to make an intersection of some kind which probably does not occur but I'd be happily surprised if different.

I just noticed as well that inside the WDPA and our portfolio we encounter areas that are smaller 1 sqkm, so this issue matters and users should be aware .

Ohm-Np commented 2 years ago

An example of workflow diagram:

Workflow_diagrams

Jo-Schie commented 2 years ago

That's great. Can we use this figure @Ohm-Np ?

Ohm-Np commented 2 years ago

Yes, I have prepared these diagrams for few other variables too.

goergen95 commented 1 year ago

Another idea I have, slightly related to this, is to let users decide which projection the package should use for their analysis. Maybe I'll open a dedicated issue for this.

mapme-initiative / mapme.biodiversity

Add description about internal processing for area estimations to the package and possible overestimations #72