prioritizr / wdpar

Interface to the World Database on Protected Areas
https://prioritizr.github.io/wdpar
GNU General Public License v3.0
37 stars 5 forks source link

JOSS Review: Improve documentation on geo-processing steps and its effects on the original geometries #53

Closed Jo-Schie closed 1 year ago

Jo-Schie commented 2 years ago

Hi @jeffreyhanson .

I think it could be helpful for users to extend a little the documentation on the internal geo-processing steps and what they actually do to the data (geometries). This should be part of the vignette, but also of the paper (which so far very much focuses very much on describing the need, but not the internals of the package).

From my first test, there are a few things that should be discussed:

If possible, I would recommend that you use interactive maps in your vignette instead of the map that you plotted (which does not allow to really see anything). Below you can find some sample code and some screenshots that might allow users to understand more intuitively what is done with the data. You could use and maybe extent this a little bit.

# ----- reproduce quickstart tutorial from https://github.com/prioritizr/wdpar -----
# load packages
library(wdpar)
library(dplyr)
library(ggmap)

# download protected area data for Malta
mlt_raw_pa_data <- wdpa_fetch("Malta",
                              wait = TRUE,
                              download_dir = rappdirs::user_data_dir("wdpar"))

# clean Malta data
mlt_pa_data <- wdpa_clean(mlt_raw_pa_data)

# reproject data to longitude/latitude for plotting
mlt_pa_data <- st_transform(mlt_pa_data, 4326)

# download basemap imagery
bg <- get_stamenmap(unname(st_bbox(mlt_pa_data)),
                    zoom = 8,
                    maptype = "watercolor",
                    force = TRUE)

# make map
ggmap(bg) +
  geom_sf(aes(fill = IUCN_CAT), data = mlt_pa_data, inherit.aes = FALSE) +
  theme(axis.title = element_blank(), legend.position = "bottom")

# ----- interactive map to compare raw to processed data -----
library(mapview)

mapView(mlt_pa_data) + mapView(mlt_raw_pa_data, col.regions = "red")

# ----- compare higher precision processing -----
# problem: Data with the default setting gets quite distorted on local level
# (see mapview before)

# clean Malta data with higher geomtry precision
mlt_pa_data_highprecision <- wdpa_clean(mlt_raw_pa_data,
                                        geometry_precision = 10000)

mapView(mlt_pa_data) +
  mapView(mlt_raw_pa_data, col.regions = "red") +
  mapView(mlt_pa_data_highprecision, col.regions = "green")

# clean Malta data with higher geomtry precision and keep overlaps
mlt_pa_data_highprecision_overlap <- wdpa_clean(mlt_raw_pa_data,
                                                geometry_precision = 10000,
                                                erase_overlaps = FALSE)

mapView(mlt_pa_data) +
  mapView(mlt_raw_pa_data, col.regions = "red") +
  mapView(mlt_pa_data_highprecision, col.regions = "green")+
  mapView(mlt_pa_data_highprecision_overlap, col.regions = "purple")

Original Data image

Default Settings

image

Higher Geometry precision

image

HIgher geometry precision and overlaps

image

Link to review

jeffreyhanson commented 2 years ago

Thank you very much for these suggestions @Jo-Schie! I'll have a think about this and begin incorperating your feedback later this week.

jeffreyhanson commented 2 years ago

Sorry, I've been swamped and am only getting to this today.

jeffreyhanson commented 2 years ago

Thanks again for providing all these suggestions @Jo-schie! I've created a PR with all the changes to incorporate your feedback (see #57). To help make it clear exactly how I've addressed each of your points, I've responded to each of them below (separately) and copied in the updated text from the PR. Since JOSS policies favor shorter publications [1], I've tried to address several of the comments in the same sentence or paragraph. I think I've responded to all your points, but please let me know if I have missed anything?

  1. Overview of the applied geo-processing steps in wdpa_clean. -> maybe this could be also a graphical representation.

    I have deliberately written the paper to avoid describing the internals of the package in a high level of detail. This is because the WDPA and WDOECM will likely be updated at some point in the future which, in turn, would require changes to the package. If I updated the manuscript to describe the data cleaning procedures -- as they are currently implemented -- in a high level of detail and the package had to be updated in the future, this would cause the published paper to be incorrect. Indeed, I fear that a discrepancy between the published paper and the package documentation would be very confusing for users. Although I appreciate your feedback, I don't think this suggestion would improve the manuscript.

  2. which geo-processing steps from wdpa_clean affect the original geometries?

    I have updated the manuscript to make it clear which steps involve geoprocessing. It now includes the following sentences.

    "These procedures include excluding areas that have yet to be fully implemented, areas that are no longer designated, and UNESCO Biosphere Reserves [@r1]. They also include geoprocessing procedures, such as repairing invalid geometries in spatial boundaries, buffering areas represented by point localities [@r3], and removing spatial overlaps [@r2]."

  3. How do they affect the geometries and what are consequences for the use (e.g. question of scale)?

    I have updated the manuscript to describe the consequences of using different levels of precision at different scales. Specifically, it now contains the following sentences.

    "For example, the precision of spatial data processing procedures can be increased so that they are suitable for local scale analyses. This is especially important because the default precision may remove smooth edges at fine scales."

  4. How do default settings affect geometries (discuss that default is country scale analysis)?

    I have updated the manuscript to describe how the default settings affect the geometries, and note how the default settings are for national-scale analyses. It now includes the following sentences.

    "Although the default settings for the data cleaning procedures are well-suited for national scale reporting of protected area coverage, they can be customized for other applications. For example, the precision of spatial data processing procedures can be increased so that they are suitable for local scale analyses. This is especially important because the default precision may remove smooth edges at fine scales."

  5. How can we fine-tune settings e.g. to work on different scales and what are the effects on the geometries?

    I have updated the manuscript to mention that settings can be fine tuned to work on different scales (please see text mentioned in previous comment). Because JOSS policies state that the manuscript should not describe API functionality [2], I have not updated the manuscript to mention the specific function or parameter that should be used. Instead, I have provided this information in the vignette, with the following sentences.

    "Please note that, by default, spatial data processing is performed at a scale suitable for national scale analyses. If you require data suitable for fine scale analysis, please a higher level of precision when cleaning the data (e.g., using the augment: geometry_precision = 10000)."

  6. How are overlapping polygon boundaries resolved and what are the consequences for analysis -> e.g. if there are two overlapping areas with different IUCN categories, how is the boundary resolved and what are the consequences for calculating area statistics.

    I have updated the manuscript to provide this information. It now contains the following sentences.

    "Specifically, overlapping geometries are erased such that areas associated with more effective management categories or have historical precedence are retained."

  7. I would recommend that you use interactive maps in your vignette instead of the map that you plotted

    Thank you for this suggestion! Although I appreciate that this would help with data visualization, I do not believe this would improve the overall user experience. This is because I would need to add the mapview (or leaflet) R package as a dependency which, in turn, would -- because the mapview R package has many, many dependencies (see https://cran.r-project.org/package=mapview) -- result in excessive installation times and a needlessly large installation footprint for users that wish to try out the package. I strongly believe that it is my responsibility to ensure that the software I write does not needlessly pollute my users' computers.

[1] JOSS policies state that the manuscript should be a "short paper" and less than 1000 words (https://joss.readthedocs.io/en/latest/submitting.html)

[2] JOSS policies state the "the paper should not include software documentation such as API (Application Programming Interface) functionality, as this should be outlined in the software documentation" (https://joss.readthedocs.io/en/latest/review_criteria.html).

Jo-Schie commented 2 years ago

Hi @jeffreyhanson : Thanks for the edits.

Some quick remarks: as to6) you sate that "Specifically, overlapping geometries are erased such that areas associated with more effective management categories or have historical precedence are retained."

-> What if there is an overlap of "old area/soft protection" with "new area/strong protection"? Is there some decision tree? If so it would be useful to illustrate it or link to the relevant literature that suggests the decision hierarchy at this specific point in the manuscript.

As to 1). I think a short description of the processing steps should be somewhere. I can follow your argument that you do not want to put it into the paper because of eventual changes to the data and the associated processing steps... but then I would put it in the package vignette and reference from the paper to it. By this you can keep it updated. Otherwise it is not clear to users what really happens to the data in the background, unless they are willing and able to understand your R-code inside the package. The transparency though is a necessary precondition to use the package for scientific analysis.

As to 7) From my understanding you do not need mapview as a dependency in your package if you create the vignette with it. the Vignette in the end is the rendered html version of your rmarkdown document... are you sure you would need mapview as a package dependency if you only use it in the vignette?

I think an alternative could be that you include some graphics in your documentation (as in my comment above) that show how geometries are affected by different settings in the package. Nevertheless I think an interactive map is much better for the user to explore what happens then some graphics.

I am thinking of this as a user who is interested in precise area information and I myself used your package with the default settings for quite some time without really checking what happened to the areas (because R is also not very convenient for that). The package vignette might be a great place to make users aware of that since it is the most relevant document that users of your package reference to.

jeffreyhanson commented 2 years ago

@Jo-Schie - thanks for getting back to me so quickly, I really appreciate it!

I'll respond to each of your follow up comments below (seperately).

References

Coetzer KL, Witkowski ET, & Erasmus BF (2014) Reviewing Biosphere Reserves globally: Effective conservation action or bureaucratic label? Biological Reviews, 89: 82--104.

Runge CA, Watson JEM, Butchart HM, Hanson JO, Possingham HP & Fuller RA (2015) Protected areas and global conservation of migratory birds. Science, 350: 1255--1258.

Updated vigntte html file is available here: wdpar.zip

jeffreyhanson commented 2 years ago

@Jo-Schie, I just wanted to follow up and ask if the PR address your concerns on this issue?

Jo-Schie commented 1 year ago

Hi @jeffreyhanson . I did not really realize that you'd already worked on this. I put it on my todo list and will come back asap.

jeffreyhanson commented 1 year ago

No worries - thanks! Please let me know if anything's not clear and I'll get back to you as soon as I can.

jeffreyhanson commented 1 year ago

@Jo-Schie, I just wanted to follow up and ask if you'd a chance to go through the PR?

jeffreyhanson commented 1 year ago

@Jo-Schie, it's been over a month since I originally created the PR to address your feedback, is there anything I can do to help make it easier for you to review it?

Jo-Schie commented 1 year ago

hi @jeffreyhanson. Documentation looks very good to me and I fully understand that you do not want to add the mapview package if it creates a dependency for the package. Also it makes sense to have the processing steps in the function documentation and I think you do explain it in a great level of detail. Sorry for the delays. Happy to close the issue now.

jeffreyhanson commented 1 year ago

Awesome - no worries - thank you very much!