openjournals / joss-reviews

Reviews for the Journal of Open Source Software

Creative Commons Zero v1.0 Universal

707 stars 37 forks source link

[REVIEW]: rmap: An R package to plot and compare tabular data on customizable maps across scenarios and time #4015

Closed whedon closed 2 years ago

whedon commented 2 years ago

Submitting author: !--author-handle-->@zarrarkhan@hugoledoux<!--end-editor-- Reviewers: @CamilleMorlighem, @maczokni Archive: 10.5281/zenodo.7085969

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/4cdf462f70681bc335ddebf5868b249c"><img src="https://joss.theoj.org/papers/4cdf462f70681bc335ddebf5868b249c/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/4cdf462f70681bc335ddebf5868b249c/status.svg)](https://joss.theoj.org/papers/4cdf462f70681bc335ddebf5868b249c)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@CamilleMorlighem & @maczokni, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @hugoledoux know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @CamilleMorlighem

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@zarrarkhan) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @maczokni

Conflict of interest

[x] #4016

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@zarrarkhan) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 2 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @CamilleMorlighem, @maczokni it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 2 years ago

Wordcount for paper.md is 1074

whedon commented 2 years ago

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.13 s (246.4 files/s, 106346.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
R                               19           1404           2413           7675
Rmd                              1            367            512            781
Markdown                         6             62              0            200
TeX                              1             18              0            189
YAML                             5             33              0            159
-------------------------------------------------------------------------------
SUM:                            32           1884           2925           9004
-------------------------------------------------------------------------------

Statistical information for the repository '8ee264c2b9dd4723350c4830' was
gathered on 2021/12/20.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Khan                             2           442            442          100.00

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments

whedon commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 2 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1007/s10113-021-01775-1 is OK
- 10.1088/1748-9326/ac046c is OK
- 10.21105/joss.00054 is OK
- 10.32614/RJ-2011-006 is OK
- 10.1007/b106573 is OK
- 10.5194/gmd-12-677-2019 is OK
- 10.5334/jors.292 is OK

MISSING DOIs

- 10.18637/jss.v084.i06 may be a valid DOI for title: tmap: Thematic Maps in R
- 10.32614/rj-2018-009 may be a valid DOI for title: Simple features for R: standardized support for spatial vector data.

INVALID DOIs

- 0.1029/2020EF001970 is INVALID

CamilleMorlighem commented 2 years ago

Hello ! I started the review. I have a few comments already:

About the software paper, I think everything is there. I just noticed a spelling mistake line 17 in the summary (in the article proof) "rmap is desgined" and there are also the few missing/invalid DOIs already noticed by whedon in a previous comment.
I can see in the user guide that the type of license is BSD_2_clause but it does not appear in the license file on the repository, maybe it should be specified there as well ?
About the community guidelines, it is very clear in the user guide how third parties can contribute to the software and report issues or problems but I could not find information about seeking support.

zarrarkhan commented 2 years ago

Hi @CamilleMorlighem,

Thanks for the very useful feedback and comments. Here are the responses to your points:

We fixed the typo on line 17 and also fixed the Missing and incorrect DOIs. The updated document can be found here: http://res.cloudinary.com/hkvhhndao/image/upload/v1640793026/zeldl1tq79mt9xnk82us.pdf or in the repo: https://github.com/JGCRI/rmap/tree/main/paper
We updated the BSD_2_clause license file. Now when you click on the repo LICENSE file (https://github.com/JGCRI/rmap/blob/main/LICENSE) it should show the proper BSD_2_clause license.
We added a support tab in the user guide and setup a discussions page for rmap in the main repo. If users do not want to submit an issue then they can start a discussion instead on the discussions page. We the developers can respond there as well as have community members respond there as well.

CamilleMorlighem commented 2 years ago

Hi @zarrarkhan,

Thank you for your responses !

I have tested the documentation and functionality of the package. The documentation is very clear and structured and very user friendly! I just have the following comments for improvement:

The information link (https://confluence.pnnl.gov/confluence/display/JGCRI/GCAM+Shape+Files) in the help section of the maps mapGCAMBasins and mapGCAMReg32 is not working for me (I reach an error page).
I could not find a statement of need in the user guide or in the documentation on the repository.
I couldn't find either a list of dependencies, but when installing it proposed me to install/update required packages so maybe that's enough?
I found some typos in the user guide:
- In the section "Subset existing shape", I think you meant unique(shapeSubset@data$subRegion) and not unique(shapeSubset@data$states)
- In section "Read data from a shapefile", I think you meant rgdal::readOGR(dsn=exampleFolder,layer=exampleFile,use_iconv=T,encoding='UTF-8') instead of rgdal::readOGR(dsn=examplePolyFolder,layer=examplePolyFile,use_iconv=T,encoding='UTF-8').
- In subsection "Gridded Data Multi-Year Diff", both code snippets return NULL: mapx$map_param_xDiffAbs_2015_1990_KMEANS mapx$map_param_xDiffPrcnt_2015_1990_KMEANS
- Same with the following code snippets in section "Gridded Data Multi-Scenario Diff":
  mapx$map_param_DiffAbs_scenario3_scenario1_KMEANS mapx$map_param_DiffPrcnt_scenario3_scenario1_KMEANS

About the functionality, everything worked fine and I don't have any performance claims. I only have the following comments:

I did not get the effect of the argument scaleRangeDiffAbs (or scaleRangeDiffPrcnt) in the map function . I could not see what changed in the results when changing the values for these arguments or when simply removing them.
I thought that one very useful function, that is not described in the documentation, is the map_find_df function to find a suitable map for your dataframe. I think it might be very useful when you don't know the package and want to use it for a specific purpose, so maybe it is worth highlighting this function in the user-guide?

Feel free to ignore any comment if you think that's not relevant!

whedon commented 2 years ago

:wave: @maczokni, please update us on how your review is going (this is an automated reminder).

whedon commented 2 years ago

:wave: @CamilleMorlighem, please update us on how your review is going (this is an automated reminder).

zarrarkhan commented 2 years ago

@CamilleMorlighem Thanks for the detailed review and very helpful comments. Addressed as follows:

Improvements

All data sources referring to the restricted (https://confluence.pnnl.gov/confluence/display/JGCRI/GCAM+Shape+Files) have been updated to point to the publicly available zenodo link: https://zenodo.org/record/4688451#.YdMNTmjMJPY
We added statement of need to the repo and user guide landing pages based on the text in the paper.
- https://github.com/JGCRI/rmap/#statement-of-need
- https://jgcri.github.io/rmap/#statement-of-need
Dependencies can be found in the https://github.com/JGCRI/rmap/blob/main/DESCRIPTION file and those mentioned in the imports list are required and will be automatically installed when users install the package so it shouldn't be something a user needs to worry about outside the package.
Typos in user guide fixed (Thanks for catching these!):
- In the section "Subset existing shape", unique(shapeSubset@data$states) siwtched to unique(shapeSubset@data$subRegion). (https://jgcri.github.io/rmap/articles/vignette_map.html#subset-existing-shape)
- In section "Read data from a shapefile", rgdal::readOGR(dsn=examplePolyFolder,layer=examplePolyFile,use_iconv=T,encoding='UTF-8') changed to rgdal::readOGR(dsn=exampleFolder,layer=exampleFile,use_iconv=T,encoding='UTF-8') (https://jgcri.github.io/rmap/articles/vignette_map.html#read-data-from-a-shapefile)
- In subsection "Gridded Data Multi-Year Diff", code snippets adjusted to return appropriate maps. The code snippet was out of date and has been updated. (https://jgcri.github.io/rmap/articles/vignette_map.html#gridded-data-multi-year-diff)
- Same with the following code snippets in section "Gridded Data Multi-Scenario Diff" code snippets adjusted to return appropriate maps. The code snippet was out of date and has been updated. (https://jgcri.github.io/rmap/articles/vignette_map.html#gridded-data-multi-year-diff)

Functionality

Thanks for noting this. The code needed to be updated to implement the changes. I think with the updated changes (now that it is actually working) you can see the usefulness of this feature (i.e. highlighting different ranges for each type of map as needed). See here: https://jgcri.github.io/rmap/articles/vignette_map.html#scale-range
Added section on map_find_df to user_guide here: https://jgcri.github.io/rmap/articles/vignette_map.html#map-find

maczokni commented 2 years ago

Hello! I've hit a bit of a stumbling block when trying to follow along with some data that isn't the one in the examples (I understand the idea to be that users can upload their own .csv files and this package will allow them to easily map this?)

First note: if the column is not called "subRegion" then the error Error in rmap::map_find_df(mydata) : data must have subRegion columns is immediately thrown. Instead of requiring users to rename their column, could this not be a parameter to specify in the function itself? E.g.: map_find_df(data = mydata, regions = subRegion). I don't think it's in line with the contribution to make things easy to ask people to rename their columns!

But where I'm stuck: once I try to map my data, it will not do this. I have renamed to values, and have checked and it is numeric. Here's my code

library(dplyr)
library(geodaData)
library(rmap)
# get own data to try
ncovr <- geodaData::ncovr

mydata <- ncovr %>% select(NAME, HR60)

map_chosen <- rmap::map_find_df(mydata)
# Error in rmap::map_find_df(mydata) : data must have subRegion columns.

mydata <- ncovr %>% select(NAME, HR60) %>% rename(subRegion = NAME)
map_chosen <- rmap::map_find_df(mydata)
rmap::map(map_chosen)
# Here the value column is not used, and no error is thrown, I just get a meaningless map

mydata <- ncovr %>% select(NAME, HR60) %>% rename(subRegion = NAME, value = HR60)
map_chosen <- rmap::map_find_df(mydata)
rmap::map(map_chosen)
#  OK it might not be an issue with naming, it just doesn't work...

rmap::map(map_chosen, fillColumn = value)
# Error in rmap::map_plot(color = color, lwd = lwd, legendType = legendType,  : 
#  object 'value' not found

# the example from the vignette works
data = data.frame(subRegion = c("CA","FL","ID","MO","TX","WY"),
                  value = c(5,10,15,34,2,7))
rmap::map(data)

# both the example and my own data's value columns contain numeric variables
class(mydata$value)
class(data$value)

# From the lack of error messages and the lack of further detail in the vignette, I cannot get it to work.

Any thoughts?

zarrarkhan commented 2 years ago

Hi @maczokni.

Thanks for your review and comments. Please see our responses below:

# Update rmap with latest changes
devtools::install_github("JGCRI/rmap")
library(rmap)

Required Column names (subRegion and Value): We have highlighted how users can identify their own column names in rmap using the subRegCol and valueCol arguments. (see end of this section: https://jgcri.github.io/rmap/articles/vignette_map.html#inputs)
Example data: Thanks for trying rmap with your example data. So we updated a few things and have some comments on this:
- ncovr <- geodaData::ncov gives an sf object with geometries. rmaps was intially unable to process this because it was not a pure R dataframe. Code has been updated to convert an sf object to a pure dataframe.
- mydata <- ncovr %>% select(NAME, HR60) results in a dataframe with a list of counties and corresponding data. If you sort by NAME you will see multiple entries for the same subRegion. rmap (or any other mapping package for that matter) would not know how to deal with this repeated data.
```
dplyr::select(NAME,HR60) %>%
as.data.frame() %>%
dplyr::arrange(NAME) %>%
head(10)
```
With the updates you can run your example as follows: Note that in general, county data is problematic because there are multiple counties with the same name in different states. In order to be more specific you can append your county names with an underscore and relevant State initials.
```
library(dplyr)
library(geodaData)
library(devtools)
library(rmap)
```

ncovr <- geodaData::ncovr

Subset first 10 rows to avoid repeated subRegions

mydata <- ncovr %>% dplyr::select(NAME,HR60) %>% head(10); mydata

Will give you the relevant plot but multiple counties

rmap::map(mydata, subRegCol = "NAME", # Set your subRegion column valueCol = "HR60") # Set your value column

In the following extended example you can see all the multiple counties labelled.

rmap appends the State Initials for you as it recognizes the counties. Although all the counties are given the same value (which is probably not the intention here).

You can also see some of the other features of rmap at play here

such as underLayer, zoom, labels etc.

rmap::map(mydata, subRegCol="NAME", valueCol="HR60", labels = T, labelSize = 3, labelRepel = T, underLayer = rmap::mapUS49, zoom=-2)



Please let us know if this helps clarify some of the issues you were having and if you have additional concerns/comments.

maczokni commented 2 years ago

Hi @zarrarkhan thanks for the update - sorry just getting back to this now but I have a major concern - you are right there are duplicates, as there are USA counties which share the same name. This means that the county names is not actually an appropriate column on which to join. This means that if I do want to join the full dataset (rather than the 10 subset you used) then I cannot do this with the package - is that right? And this is not just a quirk of the dataset I'm using to test, your own data (the inbuilt map: mapUS49County has duplicate values of the subRegion column):

# devtools::install_github("JGCRI/rmap")
library(rmap)
library(geodaData)
library(dplyr)
library(sf)

# get data for testing
data("ncovr")
# check any duplicate names in the county inbuilt map
thing <- rmap::mapUS49County
length(thing@data$subRegion)
length(unique(thing@data$subRegion))
as.data.frame(table(thing@data$subRegion)) %>% filter(Freq > 1)

Considering this is a primary function of the package, do you think you could take some more time to think about this problem? As I'm sure you are familiar, usually for each geography you will look for unique codes in the dataframes on which you can confidently join. Here in the UK we might use for example local authority codes or output area codes. In the US it seems to be state and county codes for example in this case, which is in the ncovr dataset as 'FIPS'.

I had a play with the data of the mapUS49County inbuilt map as an example, and by pasting the state and county code you can create a FIPS column to join.

# combine state code and county code for FIPS
thing <- rmap::mapUS49County
thing@data$FIPS <- paste0(thing@data$STATEFP, thing@data$COUNTYCODE)
joined <- left_join(ncovr, thing@data, by = c("FIPS" = "FIPS"))

# check how many would not be joined
nrow(joined %>% filter(is.na(source)))

# check these ones
not_joined <- joined %>% filter(is.na(source))

When we do that only 3 counties are not joined - i.e. only 3 would not be linked to the inbuilt map. You might want to consider printing a warning to users when this is the case, maybe with the name of the ones which did not join.

However this is only one case for one data set - I think it might be useful to think about some use cases of external data, how do these codes commonly look like, and have a few join columns on which users' data can be joined to the inbuilt geometries.

Hope this makes sense, let me know if not I can try to elaborate more!

zarrarkhan commented 2 years ago

@maczokni Thanks for your comments and notes. We have addressed your comments as follows:

US County duplicates: rmap::mapUS49County@data (which is from the official census.gov webpage: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html) had all unique rows, even though as you found, six subRegions had duplicate names in spite of being unique entries (which we had missed). Initially, since there were so many counties with the same name in different states we had attempted to address this by appending State and County names together in the subRegion column as shown below. As you can see each subRegion has a unique STATEFP & COUNTYCODE combination even though the subRegion names may have been duplicated in these 6 particular cases (Additionally COUNTYCODEs are also repeated in different states):
```
    subRegion       subRegionAlt STATEFP COUNTYCODE
1       Baltimore_MD    Baltimore    24        005
2       Baltimore_MD    Baltimore    24        510
3       Fairfax_VA      Fairfax      51        059
4       Fairfax_VA      Fairfax      51        600
5       Franklin_VA     Franklin     51        067
6       Franklin_VA     Franklin     51        620
7       Richmond_VA     Richmond     51        159
8       Richmond_VA     Richmond     51        760
9       Roanoke_VA      Roanoke      51        161
10      Roanoke_VA      Roanoke      51        770
11      St. Louis_MO    St. Louis    29        189
12      St. Louis_MO    St. Louis    29        510
```

Duplicate subRegions in rmap::mapUS49County There were six subRegions in rmap::mapUS49County which had duplicated subRegion names as shown above. These occurred because each of these subRegions are classified as both cities as well as counties and are allocated their own COUNTYCODES in spite of being cities. For each of these cities we also have a county with the same name in the same state. For each city we have appended their names with _city so that we have completely unique subRegions as well as added FIPS as per your suggestion. Updated table outputs (selected columns) now show:

       subRegion subRegionAlt STATEFP COUNTYCODE  FIPS
1       Baltimore_MD    Baltimore      24        005 24005
2  Baltimore_MD_city    Baltimore      24        510 24510
3         Fairfax_VA      Fairfax      51        059 51059
4    Fairfax_VA_city      Fairfax      51        600 51600
5        Franklin_VA     Franklin      51        067 51067
6   Franklin_VA_city     Franklin      51        620 51620
7        Richmond_VA     Richmond      51        159 51159
8   Richmond_VA_city     Richmond      51        760 51760
9         Roanoke_VA      Roanoke      51        161 51161
10   Roanoke_VA_city      Roanoke      51        770 51770
11      St. Louis_MO    St. Louis      29        189 29189
12 St. Louis_MO_city    St. Louis      29        510 29510

3 missing FIPS when joining with ncovr dataset The not_joined dataframe in your example above refers to 3 counties which are out of date in the ncovr dataset. These three subRegions from the not_joined datatable are shown below. References to when their names were changed are listed after the code chunk:
```
not_joined %>% dplyr::select(NAME,STATE_NAME,STATE_FIPS,CNTY_FIPS,FIPS)
       NAME   STATE_NAME STATE_FIPS CNTY_FIPS  FIPS
1       Shannon South Dakota         46       113 46113
2 Clifton Forge     Virginia         51       560 51560
3          Dade      Florida         12       025 12025
```
(Source: https://www.ddorn.net/data/FIPS_County_Code_Changes.pdf)
1. Shannon: South Dakota, 1983: Washabaugh county (FIPS 46131) merges into Jackson county (FIPS 46071).
2. Clifton Forge: Virginia, 2001: The independent city of Clifton Forge (FIPS 51560) merges into Alleghany county (FIPS 51005).
3. Dade: Florida, 1997: Dade county (FIPS 12025) is renamed as Miami-Dade county (FIPS 12086).
Broader concern: Each of our datasets provides the source for the underlying data which can be compared to user data if needed. For example the county data is from "https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html". The datasets also contain additional columns such as FIPS and other details that were part of the original datasets to ensure unique entries. Some degree of data cleaning and checks is expected with user data.
```
# Checking the source of rmap maps
rmap::mapUS49County@data$source%>%unique()
# "https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html"
rmap::mapCountries@data$source%>%unique()
# "https://www.naturalearthdata.com/downloads/"
```

Please let us know if this addresses some of your concerns or if you have additional comments.

hugoledoux commented 2 years ago

@CamilleMorlighem I see that the authors fixed your issues (statement of needs among others, see there: https://github.com/openjournals/joss-reviews/issues/4015#issuecomment-1004272488) but you didn't tick the boxes above. Can you let us know if you are now satisfied please?

CamilleMorlighem commented 2 years ago

@hugoledoux thank you for noticing it !

@zarrarkhan thank you for these improvements and answers to my comments. I think I'm done with the review. Rmap is a very nice package! I'm probably gonna use it in the future.

zarrarkhan commented 2 years ago

@hugoledoux Thanks for following up. Wondering if there are any additional comments we can address or get an update from the reviewers on the comments we addressed in https://github.com/openjournals/joss-reviews/issues/4015#issuecomment-1019419000.

hugoledoux commented 2 years ago

@maczokni could you check the updates from the authors (https://github.com/openjournals/joss-reviews/issues/4015#issuecomment-1019419000)? And tell us if you're satisfied?

maczokni commented 2 years ago

My apologies for the delay. At this moment, I'm afraid that I'm still not clear what the instructions then are for joining data. My understanding is that the USP of this package is that it provides an easy way to map things for someone who is not super familiar with GIS. This would require some work on the part of the package to have a set of values on which it could join geometries and do some searching in place of the sure. One commonly used for the USA for example is FIPS. The authors suggest "The datasets also contain additional columns such as FIPS and other details that were part of the original datasets to ensure unique entries." - however you cannot join using FIPS, or if you can I don't see this detailed on the documentation. If I try to specify the FIPS column as the "subRegCol", it does not work:

# devtools::install_github("JGCRI/rmap")
library(rmap)
library(geodaData)
library(dplyr)
library(sf)

# get data for testing
data("ncovr")

mydata <- ncovr %>% select(NAME, STATE_NAME, FIPS, HR60) %>% st_drop_geometry()

map_chosen <- rmap::map(mydata, subRegCol = 'FIPS', valueCol = 'HR60')
[1] "Starting map..."
[1] "Map data table written to /cloud/project/outputs//map_param.csv"
Error in get(paste(gsub("subReg", "map", gsub("Alt", "", subRegChosen)),  : 
  object 'mapGCAMLanddf' not found

The authors suggest: " Each of our datasets provides the source for the underlying data which can be compared to user data if needed. For example the county data is from "https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html"." and that "Some degree of data cleaning and checks is expected with user data."

For me to join the ncovr data (just one example) I have to get a lookup table from state name to state abbreviation, join it to my data set, then create a new column which has the same strings as the subRegCol expects, and then I can request a map, however I still run into an error:

state_lookup <- data.frame(state_name = c("Alabama","Alaska","Arizona","Arkansas","California","Colorado","Connecticut","Delaware",
                                          "Florida","Georgia","Hawaii","Idaho","Illinois","Indiana","Iowa","Kansas","Kentucky","Louisiana",
                                          "Maine","Maryland","Massachusetts", "Michigan","Minnesota","Mississippi","Missouri","Montana",
                                          "Nebraska","Nevada","New Hampshire", "New Jersey","New Mexico","New York","North Carolina",
                                          "North Dakota","Ohio","Oklahoma","Oregon","Pennsylvania","Rhode Island","South Carolina","
                                          South Dakota","Tennessee","Texas","Utah","Vermont","Virginia","Washington","West Virginia",
                                          "Wisconsin","Wyoming"), 
                           state_code = c("AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN","IA","KS","KY",
                                     "LA","ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ","NM","NY","NC","ND","OH","OK",
                                          "OR","PA","RI","SC","SD","TN","TX","UT","VT","VA","WA","WV","WI","WY"))

mapthis <- dplyr::left_join(mydata, state_lookup, by = c("STATE_NAME" = "state_name"))

mapthis$subRegCol <- paste0(mapthis$NAME, "_",mapthis$state_code)

map_chosen <- rmap::map(mapthis, subRegCol = 'subRegCol')

# [1] "Starting map..."
# Coordinate system already present. Adding new coordinate system, which will replace the existing one.
# Error in `check_required_aesthetics()`:
#   ! geom_polygon requires the following missing aesthetics: x and y
# Run `rlang::last_error()` to see where the error occurred.
# There were 32 warnings (use warnings() to see them)
# Warning messages:
#   1: Unknown or uninitialised column: `lon`.
# 2: In min(datax1$lon) : no non-missing arguments to min; returning Inf
# 3: Unknown or uninitialised column: `lon`.
# 4: In min(x, na.rm = na.rm) : no non-missing arguments to min; returning Inf
# 5: In max(x, na.rm = na.rm) : no non-missing arguments to max; returning -Inf
# ...etc...

So this is where my sticking point is, that if the USP is that this is so much easier than getting your own shapefile, this process should be more smooth. Now maybe I'm missunderstanding, and in that case I'm happy to hear different, but at the moment, but if the advice is that the user has to go check out the source data anyway, I don't see why not to simply get the shapefile from the US Census bureau and use tmap to make a quick map.

EDIT

Actually the tigris package makes this even easier:

tg_counties <- tigris::counties()
mapthis <- dplyr::left_join(tg_counties, mydata, by = c("GEOID" = "FIPS"))
tmap::qtm(mapthis, fill = "HR60")

I understand there is further functionality, and I am sure there is more added value there, but at the moment just struggling to get to that point...! The solutions like selecting only 10 of the 3000+ counties required aren't really useful as they aren't realistic use cases.

Happy to be pointed out that I'm missing something super obvious, but just really would like to get a working example with a real world data set before I can sign off on this.

zarrarkhan commented 2 years ago

@maczokni

Thank you for your response. Here is a working example with your dataset after removing out-of-date FIPS as we discussed above. I tried your counter-examples and ran into some issues for which I have shared more details below.

Working Example with your data

data("ncovr")
mydata <- ncovr %>% select(NAME, STATE_NAME, FIPS, HR60) %>% st_drop_geometry() %>%
  dplyr::left_join(rmap::mapUS52County@data, by="FIPS") %>%
  dplyr::mutate(value=HR60) %>%
  dplyr::filter(!is.na(subRegion)) # Remove non-existent FIPS 51560 (Clifton Forge), FIPS 12025 (Dade)
rmap::map(data=mydata)

Missing FIPS in ncovr data You are using a specific dataset "ncovr" which has FIPS for counties that do not exist any more as we described in #4015: (Source: https://www.ddorn.net/data/FIPS_County_Code_Changes.pdf)

Shannon: South Dakota, 1983: Washabaugh county (FIPS 46131) merges into
Jackson county (FIPS 46071).
Clifton Forge: Virginia, 2001: The independent city of Clifton Forge (FIPS 51560) merges
into Alleghany county (FIPS 51005).
Dade: Florida, 1997: Dade county (FIPS 12025) is renamed as Miami-Dade
county (FIPS 12086).

Your Counter Example 1

download.file("https://www2.census.gov/geo/tiger/GENZ2018/kml/cb_2018_us_county_20m.zip","counties.zip")
unzip("counties.zip")
counties_geojson <- sf::st_read("cb_2018_us_county_20m.kml")
mapthis <- dplyr::left_join(counties_geojson, mydata, by = c("GEOID" = "FIPS"))
tmap::qtm(mapthis, fill = "HR60")

Error: Join columns must be present in data.
x Problem with `GEOID`.

Your Counter Example 2

tg_counties <- tigris::counties()
mapthis <- dplyr::left_join(tg_counties, mydata, by = c("GEOID" = "FIPS"))
tmap::qtm(mapthis, fill = "HR60")

My R does not give errors but it stalls and is unable to display the final map.
rmap produces the map for this data almost instantly and does not require any downloads

Updates made and additional comments

We have modified the code to check for NA subRegions and so you do not need to filter your data anymore.

Working Example with your data after updates

# devtools::install_github("JGCRI/rmap") # To update to latest version
library(rmap)
library(geodaData)
library(dplyr)
library(sf)

data("ncovr")
mydata <- ncovr %>% select(NAME, STATE_NAME, FIPS, HR60) %>% st_drop_geometry() %>%
  dplyr::left_join(rmap::mapUS52County@data, by="FIPS") %>%
  dplyr::mutate(value=HR60)
rmap::map(data=mydata)

Please let us know if the example above is working for you and if you have any additional comments or issues.

maczokni commented 2 years ago

Hiya I'll give this a go later thanks but it's still just one example of the counties and it still requires the user to do data cleaning, so still not sure about the target audience being non GIS users.

However in the meantime I wanted to share two concerns with the code:

1 - the package functions producing character vectors are created using print() rather than messages using message()? It's going to be an issue for users because it defeats things like suppressMessages() and knitr chunk options when the package is used in a Rmarkdown document. It would be even better if you could use rlang::inform() because it has nice formatting for their long messages while still playing nicely with functions that expect base::message().

2 - you should consider adopting sf for the internal workings rather than sp. Roger Bivand (who wrote the sp package in the first place) recommends at https://doi.org/ghnwg3 that new packages use sf rather than sp and says that packages that continue to depend on SP will have to "hope" it continues to be maintained. I also note that rmap depends on rgdal and rgeos, both of which will be retired by the end of 2023 (as noted in bold type at the top of the NEWS.md pages of both packages) and neither of which are required if you use sf instead.

zarrarkhan commented 2 years ago

@maczokni

Thank you for the helpful comments and suggestions. We have addressed these in commit da9c41574820679fae9e0cb4871cf014edf2213c with responses below:

Comment on Target Audience: Sorry if this wasn't clear but by targeting non-GIS users we did not mean that this would not require data cleaning or processing. rmap provides extensive examples for users to build maps of common geographical boundaries (countries, states, basins, counties etc.) with the user requiring a basic table with a minimum of two columns ("subRegion" and "value") as you have seen. This still means the user has to prepare their datasets into those basic tables and may run into incompatibility issues if using region names which don't exist in rmap's databases.
Print: We have replaced all print statements with rlang::inform
sp, rgdal, rgeos dependencies: We have removed the dependency on rgdal and rgeos. rmap still depends on sp as it builds all of its maps from the SpatialPolygonDataFrame class. We agree that this should be updated to sf and will do so in v2.0 of the model as the package will need extensive testing for these replacements.

maczokni commented 2 years ago

I'm really not trying to be difficult, but I believe the package requires extensive testing anyway. I just tried again with a different data set to map Local Authority areas in London and this is the result:

map_param_KMEANS

Here's the code:

library(rmap)
library(tidyverse)
library(readxl)
library(janitor)

# read in new test data
download.file("https://data.london.gov.uk/download/gcse-results-by-borough/a8a71d73-cc48-4b30-9eb5-c5f605bc845c/gcse-results.xlsx", 
              destfile = "gcse-results.xlsx")
gcse_results <- read_xlsx("gcse-results.xlsx", sheet = "2020-21") 

# clean up test data
colnames <- paste0(gcse_results[1,], gcse_results[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results) <- colnames
gcse_results <- gcse_results %>%
  clean_names() %>% 
  slice(4:36) %>% 
  mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))

# try to map using Local Authority name
my_map <- rmap::map(gcse_results,
          subRegCol = "area",
          valueCol = "number_of_pupils_at_the_end_of_key_stage_4")

Again I know you can manually find a better fitting map, but then once again I'm not certain the contribution over finding a map manually in other ropensci packages like rnaturalearth and using tmap:

library(rnaturalearth)
library(tmap)

london_boroughs <- ne_states(country = "United Kingdom", returnclass = "sf") %>% 
  filter(region == "Greater London")

gcse_results_sf <- left_join(london_boroughs, gcse_results, 
                             by = c("name" = "area")) 

qtm(gcse_results_sf, fill = "number_of_pupils_at_the_end_of_key_stage_4")

which returns:

tmap_london

Or if I want the legend fixed:

qtm(gcse_results_sf, fill = "number_of_pupils_at_the_end_of_key_stage_4",  fill.title="Pupils at KS4")

tmap_w_title

In sum: I think that there is some more extensitve and thorough testing required than something that should be done by reviewers. I'm just spot checking here and keep coming across issues - so at this point I would advise some more thorough testing of the package overall by the authors, before I would be happy to sign off on this.

zarrarkhan commented 2 years ago

@maczokni of course you are not being difficult and on the contrary we appreciate you taking the time and effort to give your detailed feedback. We will look at the issues you have pointed out and a more through test of the existing maps and get back to you.

zarrarkhan commented 2 years ago

@maczokni @hugoledoux

Thank you for your earlier comments. We have tried to address all of your comments and made the following changes. Please, pull in the latest changes from main to test these and let us know if you have any additional comments.

We have removed the dependency on the sp package and switched to sf. All rmap internal data maps are now sf objects
We added the ability to change map projections: https://jgcri.github.io/rmap/articles/vignette_map.html#projections
We improved test coverage of all underlying functions. Three of the total four rmap functions are now at >85% CI test coverage with over all package test coverage improved from 55% to 65%. (map.R 60.84%, map_plot.R 86%, map_find.R 92%, printPdfPng.R 91%)
We tested the package for the example cases provided by @maczokni, our original example cases in our documentation, as well as additional test cases not in the documentation. Some of these are provided below:

Load Libraries

library(rmap); library(geodaData); library(dplyr); library(sf); library(tidyverse)
library(readxl); library(janitor)

Example 1

data("ncovr")
mydata <- ncovr %>%
  dplyr::select(NAME, STATE_NAME, FIPS, HR60) %>%
  sf::st_drop_geometry() %>%
  dplyr::left_join(rmap::mapUS52County, by="FIPS")
rmap::map(data=mydata, valueCol = "HR60", legendTitle = "HR60")

map_param_KMEANS

Example 2

download.file("https://data.london.gov.uk/download/gcse-results-by-borough/a8a71d73-cc48-4b30-9eb5-c5f605bc845c/gcse-results.xlsx", destfile = "gcse-results.xlsx")
gcse_results <- readxl::read_xlsx("gcse-results.xlsx", sheet = "2020-21")

# clean up test data
colnames <- paste0(gcse_results[1,], gcse_results[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results) <- colnames
gcse_results <- gcse_results %>%
  clean_names() %>%
  slice(4:36) %>%
  mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))

# Plot
my_map <- rmap::map(gcse_results,
                    subRegCol = "area",
                    valueCol = "number_of_pupils_at_the_end_of_key_stage_4",
                    legendTitle = "Pupils at KS4")

map_param_KMEANS

Example 3

# Test Covid data
# Our World in Data JHU https://github.com/owid/covid-19-data/tree/master/public/data
# State vaccination data: https://github.com/owid/covid-19-data/raw/master/public/data/vaccinations/us_state_vaccinations.csv
covid_data <- read.csv(url("https://github.com/owid/covid-19-data/raw/master/public/data/vaccinations/us_state_vaccinations.csv"))%>%
  tibble::as_tibble() %>%
  dplyr::select(subRegion=location,date,value=people_vaccinated_per_hundred) %>%
  dplyr::mutate(subRegion = dplyr::if_else(subRegion=="New York State","New York",subRegion)) %>%
  dplyr::filter(date == max(date),
                subRegion %in% rmap::mapUS49$subRegionAlt); covid_data

rmap::map(covid_data,
          title=paste0("People Vaccinated per hundered ",max(covid_data$date)),
          legendTitle = "People per 100")

map_param_KMEANS

Examples from Documentation: https://jgcri.github.io/rmap/articles/vignette_map.html

zarrarkhan commented 2 years ago

Hi @hugoledoux,

Just wanted to check in to see if there were any updates since we submitted our modifications in https://github.com/openjournals/joss-reviews/issues/4015#issuecomment-1074078464 two weeks ago.

Thanks!

hugoledoux commented 2 years ago

@maczokni could you please have one look at the updated material here?

hugoledoux commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

zarrarkhan commented 2 years ago

@hugoledoux Thank you for generating the proof. We have read the proof and corrected a few typos we found.

In addition we see that the referencing style seems inconsistent but it is not clear how to adjust this. For example: In the article proof linked above on line 22 we have the following:

publications (Khan et al., 2021; Wild, Khan, Clarke, et al., 2021; Wild, Khan, Zhao, et al., 2021)

Which refer to the following references:

Line 118: Khan, Z., Wild, T. B., Iyer, G., Hejazi, M., & Vernon, C. R. (2021). The future evolution of energy-water-agriculture interconnectivity across the US. Environmental Research Letters,16(6), 065010. https://iopscience.iop.org/article/10.1088/1748-9326/ac046c
Line 140: Wild, T. B., Khan, Z., Clarke, L., Hejazi, M., Bereslawski, J. L., Suriano, M., Roberts, P., Casado, J., Miralles-Wilhelm, F., Gavino-Novillo, M., Muñoz-Castillo, R., Moreda, F., Zhao, M., Yarlagadda, B., Lamontagne, J., & Birnbaum, A. (2021). Integrated energy-water-land nexus planning in the Colorado River Basin (Argentina). Regional Environmental Change, 21(3), 62. https://doi.org/10.1007/s10113-021-01775-1
Line 145: Wild, T. B., Khan, Z., Zhao, M., Suriano, M., Bereslawski, J. L., Roberts, P., Casado, J., Gaviño-Novillo, M., Clarke, L., & Hejazi, M. (2021). The Implications of Global Change for the Co-Evolution of Argentina’s Integrated Energy-Water-Land Systems. Earth’s Future, 9(8), e2020EF001970. https://doi.org/10.1029/2020EF001970

The first citation only references a single author while the rest list three authors.

Apart from this we have checked the proof and pushed all changes in commit: https://github.com/JGCRI/rmap/commit/6730a68880e6673584d67b45e66820ee1fb86d9a

Please let us know what we can do next.

Thanks!

zarrarkhan commented 2 years ago

Hi @hugoledoux, Just checking in to see if we can provide any more information or help move things ahead with the review. Thanks for your management of the process! Best Zarrar

hugoledoux commented 2 years ago

Sorry @zarrarkhan I am on holidays. I will be back in 2w.

I was hoping that @maczokni would finalise her review in the meantime, if we haven't heard from her then I'll see how to proceed.

Hope you understand.

zarrarkhan commented 2 years ago

@hugoledoux No worries. That sounds good. Enjoy the vacations. Best Zarrar

maczokni commented 2 years ago

Hi apologies for delay, I will try to get to this this week.

maczokni commented 2 years ago

Hi, I'm having trouble downloading the new version:

> devtools::install_github("JGCRI/rmap")

Downloading GitHub repo JGCRI/rmap@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet,  : 
  download from 'https://api.github.com/repos/JGCRI/rmap/tarball/HEAD' failed

I also tried with remotes instead of devtools but no luck. I can download other repos no problem with the same method (e.g. devtools::install_github("mpjashby/sfhotspot") downloads the package no problem).

Here's my session info:

R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13   magrittr_2.0.3    usethis_2.1.6     devtools_2.4.3    pkgload_1.2.4     R6_2.5.1          rlang_1.0.2      
 [8] fastmap_1.1.0     tools_4.2.0       pkgbuild_1.3.1    sessioninfo_1.2.2 cli_3.3.0         withr_2.5.0       ellipsis_0.3.2   
[15] remotes_2.4.2     rprojroot_2.0.3   lifecycle_1.0.1   crayon_1.5.1      brio_1.1.3        processx_3.5.3    purrr_0.3.4      
[22] callr_3.7.0       fs_1.5.2          ps_1.7.0          curl_4.3.2        testthat_3.1.4    memoise_2.0.1     glue_1.6.2       
[29] cachem_1.0.6      compiler_4.2.0    desc_1.4.1        prettyunits_1.1.1

I suspect it will be to do with the package size and therefore time it takes to download. If I download via browser the tarball from the URL, the file is 120.9 (!!!) MB. Please see here about size and limitations for packages on cran: https://thecoatlessprofessor.com/programming/r/size-and-limitations-of-packages-on-cran/

I am happy to continue the review having installed from the package archive file I've downloaded, but it might be something to consider to reduce the size (again referring to the cran guidance).

zarrarkhan commented 2 years ago

@maczokni Thanks for this comment. We are aware of the size issue and mention it in our paper. We also had a previous version with the data separated from the package to meet CRAN guidelines. However, since all the functions in rmap depend on having access to the built-in maps it would require the map sets to be downloaded each time. Having all download during the intial install made this a one-time effort. We realize that making this choice means we won't be publishing rmap to CRAN. From the paper:

"While built-in maps increase the size of the package, having direct access to these allows for automated searching and quick deployment of relevant shapefiles without the need to download any data."

The total package size has not changed size since you first reviewed the package so it is strange that you are getting this download error now. We will check further to investigate these restrictions and can re-consider separating the data from the main package.

Best

Zarrar

maczokni commented 2 years ago

Thanks for this! It might be that I initially downloaded at work with better internet access. I am not an expert in packaging data into packages, but is there not a solution to appeal to a database? The package sizes for rnaturalearth and rnaturalearthdata are both much smaller (<1mb and 3mb) and from what I understand perform roughly the same functionality of finding a relevant shapefile from a query? It might introduce a separate step I guess (e.g. find all which meet some higher-level category e.g. ask user to specify country first, get those relevant geometries, and then match on the relevant "subRegion" column?). Anyway like I said this is not my area of expertise, so I'm not sure what solution to suggest.

Another thing: when you summon the help file for the package it just links to the github repo and github pages. The link for the cheatsheet is broken (it says https://github.com/JGCRI/rmap/blob/main/rmapCheatsheet.pdf instead of https://jgcri.github.io/rmap/cheatsheet.pdf)

Overall, I remain unconvinced that this is easy for those who are new to GIS, or easier than existing solutions. I don't think (at least to me) it is clear enough from the documentation how to go about applying this to my own data, which I believe is meant to be the strength of this package. I say this after running into more issues, this time trying to evoke the automatic "Multi-Year-Class-Scenario-Facet". The example in the documentation produces a dataframe with repeat values in the subRegion column, but when I try with the gcse data we used above I get an error about this (presumably when the map is being summoned):

library(rmap)
library(tidyverse)
library(readxl)
library(janitor)
library(sf)

# read in new test data
download.file("https://data.london.gov.uk/download/gcse-results-by-borough/a8a71d73-cc48-4b30-9eb5-c5f605bc845c/gcse-results.xlsx", 
              destfile = "gcse-results.xlsx")
gcse_results <- read_xlsx("gcse-results.xlsx", sheet = "2020-21") 
gcse_results_2 <- read_xlsx("gcse-results.xlsx", sheet = "2019-20") 

# clean up test data
colnames <- paste0(gcse_results[1,], gcse_results[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results) <- colnames
gcse_results <- gcse_results %>%
  clean_names() %>% 
  slice(4:36) %>% 
  mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))

colnames <- paste0(gcse_results_2[1,], gcse_results_2[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results_2) <- colnames
gcse_results_2 <- gcse_results_2 %>%
  clean_names() %>% 
  slice(4:36) %>% 
  mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))

gcse_results_joined <- rbind(gcse_results %>% mutate(year = 2020), 
                          gcse_results_2 %>% mutate(year = 2019))

rmap::map(data = gcse_results_joined,
           subRegCol = "area",
           valueCol = "number_of_pupils_at_the_end_of_key_stage_4",
           ncol=4,
           background = T)

Starting map...
Map data table written to /Users/user/Desktop/teaching/crime_mapping_discussions/outputs//map_param.csv
Error in rmap::map(data = gcse_results_joined, subRegCol = "area", valueCol = "number_of_pupils_at_the_end_of_key_stage_4",  : 
  Input data data has multiple values. Please check your data.

I assumed we could get around this by extracting the map first, and using this. To do this try map_find() function. This function has not been changed like the rmap() function, and so still requires a renaming of the column to subRegion:

> map_find(gcse_results_joined, subRegCol = "area")
Error in map_find(gcse_results_joined, subRegCol = "area") : 
  unused argument (subRegCol = "area")
> map_find(gcse_results_joined)
Error in map_find(gcse_results_joined) : data must have subRegion columns.

But even after creating a "subRegion" column in my data, this map doesn't help me:

gcse_results_joined$subRegion <- gcse_results_joined$area

thing <- map_find(gcse_results_joined)

rmap::map(data = gcse_results_joined,
          underLayer = thing
          background = T)

Specifying the value column summons the same error:

> rmap::map(data = gcse_results_joined,
+           underLayer = thing,
+           background = T,
+           valueCol = "number_of_pupils_at_the_end_of_key_stage_4")
Starting map...
Map data table written to /Users/user/Desktop/teaching/crime_mapping_discussions/outputs//map_param.csv
Error in rmap::map(data = gcse_results_joined, underLayer = thing, background = T,  : 
  Input data data has multiple values. Please check your data.

Various specifications didn't help. In the end the only solution was to rename my data in the image of the data specified in the example:

data = data.frame(subRegion = gcse_results_joined$area,
                  year = gcse_results_joined$year,
                  value = gcse_results_joined$number_of_pupils_at_the_end_of_key_stage_4)

rmap::map(data = data,
          ncol=4,
          background = T)

So I think I'm misunderstanding the process here, and I would say I'm pretty comfortable with GIS and R, having published, and actively teaching in this space, but feel quite lost following the documentation with my own data. It is possible that to someone from a different background these processes are more intuitive, perhaps there is a need for this in the domain of the authors, where people have a certain approach/ understanding which is required to use this package. In that case I think this should be specified in the documentation.

Basically the behaviour I would expect would be something like:


rmap::map(data = gcse_results_joined,
          subRegion = area,
          year = year,
          value = number_of_pupils_at_the_end_of_key_stage_4,
          ncol=4,
          background = T)

My comment once again is that if the audience is people who have no GIS knowledge, and for some reason don't want to use rnaturalearth+tmap solutions to map their data, then:

1 - all functions should allow to specify where the relevant columns are in the own data (e.g. subRegion = "col", etc) 2 - the documentation should be improved with some clear instruction of how to apply the functions to an external data sets.

zarrarkhan commented 2 years ago

Thanks for the detailed comments. Please see responses below:

-Package Size: We did consider (and tested) hosting all the maps in a separate data repository. However this required the maps to be downloaded each time a function was called and for now we decided to provide all maps with the initial installation. rnaturalearth comes with some pre-loaded maps (ne_countries, ne_states) but does also use ne_download for other maps and resolutions (https://www.rdocumentation.org/packages/rnaturalearth/versions/0.1.0/topics/ne_download)

-Help File and Links: We updated the help file with a short package description and also checked that all the links work.

Github: https://github.com/JGCRI/rmap
Webpage: https://jgcri.github.io/rmap/
Cheatsheet: https://jgcri.github.io/rmap/cheatsheet.pdf

-Your Example: Your example ran into a case which we had not come across in our tests yet i.e. both an x column and a year column present in the data. rmap plots the time dimension using an x column and if the data comes with a year column it assumes this as the x column. In your case, you had both an x column and a year column and so the x column was used. Since the x column was all NAs it lead to the duplicated rows message. We have updated the package so this situation is handled with an appropriate message to inform the user. We also updated the script so the user can specify the key arguments directly and renamed them to be more intuitive as you suggested (directly using subRegion, value, x, instead of subRegCol, valueCol). We use x in case the time dimension is something other than year, such as week or month etc. This should work now:

library(rmap)
library(tidyverse)
library(readxl)
library(janitor)
library(sf)

# read in new test data
download.file("https://data.london.gov.uk/download/gcse-results-by-borough/a8a71d73-cc48-4b30-9eb5-c5f605bc845c/gcse-results.xlsx", 
              destfile = "gcse-results.xlsx")
gcse_results <- read_xlsx("gcse-results.xlsx", sheet = "2020-21") 
gcse_results_2 <- read_xlsx("gcse-results.xlsx", sheet = "2019-20") 

# clean up test data
colnames <- paste0(gcse_results[1,], gcse_results[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results) <- colnames
gcse_results <- gcse_results %>%
  clean_names() %>% 
  slice(4:36) %>% 
  mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))

colnames <- paste0(gcse_results_2[1,], gcse_results_2[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results_2) <- colnames
gcse_results_2 <- gcse_results_2 %>%
  clean_names() %>% 
  slice(4:36) %>% 
  mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))

gcse_results_joined <- rbind(gcse_results %>% mutate(year = 2020), 
                          gcse_results_2 %>% mutate(year = 2019))

rmap::map(data = gcse_results_joined,
           subRegion = "area",
           value = "number_of_pupils_at_the_end_of_key_stage_4")

-All functions should allow custom columns: We have adjusted the script so that users can specify their own columns for all the key arguments (value, subRegion, scenario, class, x) for all functions. We add a section in the documentation on this: https://jgcri.github.io/rmap/articles/vignette_map.html#customcolumns

-Advantages of using rmap: So the true utility of rmap comes in comparing data across scenarios, classes and time. The examples below using your data demonstrate some of these capabilities which would not be so simple using existing software:

Multiple ggplots based on data provided saved as a list which can be edited later

rmap::map(data = gcse_results_joined,
           subRegion = "area",
           value = "number_of_pupils_at_the_end_of_key_stage_4") -> mapx

mapx$map_param_KMEANS
mapx$map_param_MEAN_KMEANS
mapx$map_param_MEAN_KMEANS + ggplot2::theme_dark()

map_param_KMEANS map_param_MEAN_KMEANS map_param_MEAN_KMEANS_dark

Compare across time

rmap::map(data = gcse_results_joined,
           subRegion = "area",
           value = "number_of_pupils_at_the_end_of_key_stage_4",
           xRef = 2019,
           save =F, show =F) -> mapx

mapx$map_param_KMEANS_xDiffPrcnt

See the percentage change from 2019 to 2020

map_param_KMEANS_xDiffPrcnt

Compare Different Classes (In this case hypothetical genders)

gcse_results_joined_classes <- gcse_results_joined %>%
  dplyr::mutate(gender = "girls") %>%
  dplyr::bind_rows(gcse_results_joined %>%
                     dplyr::mutate(gender="boys",
                                   number_of_pupils_at_the_end_of_key_stage_4 =
                                     number_of_pupils_at_the_end_of_key_stage_4*runif(nrow(gcse_results_joined))))

rmap::map(data = gcse_results_joined_classes,
               class = "gender",
               subRegion = "area",
               xRef = 2019,
               value = "number_of_pupils_at_the_end_of_key_stage_4", save=F, show=F) -> mapx

mapx$map_param_KMEANS
mapx$map_param_KMEANS_xDiffPrcnt

map_param_KMEANS

See how the value changed for each gender between 2019 to 2020

map_param_KMEANS_xDiffPrcnt

Compare Scenarios and classes (In this case hypothetical income_classes)

gcse_results_joined_scenario <- gcse_results_joined %>%
  dplyr::mutate(income_level = "high_income") %>%
  dplyr::bind_rows(gcse_results_joined %>%
                     dplyr::mutate(income_level="low_income",
                                   number_of_pupils_at_the_end_of_key_stage_4 =
                                     number_of_pupils_at_the_end_of_key_stage_4*runif(nrow(gcse_results_joined))))

rmap::map(data = gcse_results_joined_scenario,
          scenario = "income_level",
          subRegion = "area",
          xRef = 2019,
          scenRef = "low_income",
          value = "number_of_pupils_at_the_end_of_key_stage_4",
          save=F,show=F) -> mapx

mapx$map_param_KMEANS
mapx$map_param_KMEANS_DiffPrcnt

map_param_KMEANS

See how the value changed for each gender and income class between 2019 and 2020

map_param_KMEANS_xDiffAbs

zarrarkhan commented 2 years ago

@hugoledoux @maczokni Hi, Just wanted to check in on any further feedback related to our response (https://github.com/openjournals/joss-reviews/issues/4015#issuecomment-1163520937) submitted 22 June. Thanks!

maczokni commented 2 years ago

Hi All,

I read your response about the main contribution, but now I wonder if the paper needs updating then? You state in the comment above "So the true utility of rmap comes in comparing data across scenarios, classes and time" That is only 1 out of 4 points raised in the original paper. Would it be worth to focus the package and paper to this contribution?

If you must keep the table to map part, for the point "State of the field: Do the authors describe how this software compares to other commonly-used packages?" I would like to see discussion v rnaturalearth and other approaches to detail in what way this compares. I would like to be convinced in the paper that there is an added value here.

My other sticking point was "Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines" specifically the point: "enables some new research challenges to be addressed or makes addressing research challenges significantly better (e.g., faster, easier, simpler)." The data table to map being faster/easier/simpler was something I was not convinced about, but with the long back and forth we are at a point where I no longer want to be carrying out extensive testing on behalf of the authors. On the other hand if the key contribution is now this ease for the comparison around the different scenarios, this should be emphasised more.

At time of submission this package was not robustly tested enough. I've given a hefty amount of my time to testing it and making recommendations which I think were quite substantial. At this point I'm not willing to give more time to keep finding new test cases, but would like to be convinced that the authors themselves had gone through the trouble of doing this. Every time it fell over for me was a case where this could/should have been flagged by extensive internal testing. I would really like the authors to show they have done this themselves, instead of relying on the reviewers (actually I'm not sure the other reviewer did test this? so reviewer) to do so.

So to sum up: on the two outstanding points above in the paper (1 - State of the field, and 2 - Substantial scholarly effort) - if the authors make a case in updates to the paper that argue for these points, and demonstrate some significant testing, I will then tick these boxes and pass back to the editor @hugoledoux whether he is happy for it to go ahead.

hugoledoux commented 2 years ago

thanks @maczokni for the update, this is clear and I agree with what you wrote.

Reviewers are expected indeed to test the package, but reviewers are indeed not debuggers. That being said, there are often bugs in software and one can't expect all software to be perfect.

@CamilleMorlighem a point was raised about testing, and since you did the review a while ago (before changes to the package were made) could you do them again perhaps?

zarrarkhan commented 2 years ago

@hugoledoux @maczokni

Thank you for your latest comments and the time you put into your thorough review. Your comments are addressed below and in the latest version of the paper draft: https://res.cloudinary.com/hkvhhndao/image/upload/v1657942383/ew2vgphg2dfqi6qy5n0t.pdf which can also be seen at: https://github.com/JGCRI/rmap/blob/main/paper/paper.md

Comment 1: If you must keep the table to map part, for the point "State of the field: Do the authors describe how this software compares to other commonly-used packages?" I would like to see discussion v rnaturalearth and other approaches to detail in what way this compares. I would like to be convinced in the paper that there is an added value here.

Response: The table to map is one of several functionalities of rmap that makes it easy to use. We agree that it is a minor improvement and so remove it from the statement of need section as its own paragraph. We leave discussion of this functionality in the paper in other sections.

Comment 2: My other sticking point was "Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines" specifically the point: "enables some new research challenges to be addressed or makes addressing research challenges significantly better (e.g., faster, easier, simpler)." The data table to map being faster/easier/simpler was something I was not convinced about, but with the long back and forth we are at a point where I no longer want to be carrying out extensive testing on behalf of the authors. On the other hand if the key contribution is now this ease for the comparison around the different scenarios, this should be emphasised more.

Response: Yes, one of the key contributions is the comparison around different scenarios and time periods. This is already emphasized in the paper as follows:

- In the title of the paper: "rmap: An R package to plot and compare tabular data on customizable maps across scenarios and time". - Abstract (lines 13 to 14): Additionally rmap automatically detects and produces comparison maps if the data has multiple scenarios or time periods as well as animations for time series data - Statement of Need (lines 54 to 59): Difference maps: Existing packages do not produce difference maps to compare across scenarios or time periods. rmap provides this functionality by automatically recognizing multiple scenarios and time periods to produce difference maps across these dimensions. An important aspect of spatial data is exploring the difference between two scenarios or time periods and rmap makes this a seamless process. - And as a Separate Section called Compare Scenarios (lines 84 -100): With example code and figures.

Spatial comparisons across scenarios and time are very valuable in multiple fields and this is one of the features of rmap that has already been significantly used in the following published papers (and also in several additional papers in-progress and submission):

Wild, T.B., Khan, Z., Clarke, L., Hejazi, M., Bereslawski, J.L., Suriano, M., Roberts, P., Casado, J., Miralles-Wilhelm, F., Gavino-Novillo, M. and Muñoz-Castillo, R., 2021. Integrated energy-water-land nexus planning in the Colorado River Basin (Argentina). Regional Environmental Change, 21(3), pp.1-16.
Wild, T.B., Khan, Z., Zhao, M., Suriano, M., Bereslawski, J.L., Roberts, P., Casado, J., Gaviño‐Novillo, M., Clarke, L., Hejazi, M. and Miralles‐Wilhelm, F., 2021. The Implications of Global Change for the Co‐Evolution of Argentina's Integrated Energy‐Water‐Land Systems. Earth's Future, 9(8), p.e2020EF001970.
Sampedro, J., Khan, Z., Vernon, C.R., Smith, S.J., Waldhoff, S. and Van Dingenen, R., 2022. rfasst: An R tool to estimate air pollution impacts on health and agriculture. Journal of Open Source Software, 7(69), p.3820.
Khan, Z., Wild, T.B., Iyer, G., Hejazi, M. and Vernon, C.R., 2021. The future evolution of energy-water-agriculture interconnectivity across the US. Environmental Research Letters, 16(6), p.065010.
Khan, Z., Iyer, G., Patel, P., Kim, S., Hejazi, M., Burleyson, C. and Wise, M., 2021. Impacts of long-term temperature change and variability on electricity investments. Nature communications, 12(1), pp.1-12.
Khan, Z., Wild, T.B., Carrazzone, M.E.S., Gaudioso, R., Mascari, M.P., Bianchi, F., Weinstein, F., Pérez, F., Pérez, W., Miralles-Wilhelm, F. and Clarke, L., 2020. Integrated energy-water-land nexus planning to guide national policy: an example from Uruguay. Environmental Research Letters, 15(9), p.094014.

zarrarkhan commented 2 years ago

@hugoledoux can you please review our response and let us know if anything else is needed? Thanks Best Zarrar

CamilleMorlighem commented 2 years ago

Hi all,

Following comments from @hugoledoux and @maczokni, I have redone old tests of the package I did a while ago and new tests considering the updates of the package. See the following examples:

Load libraries

library(stringr)
library(dplyr)
library(rmap)

Example 1 : Senegalese population per region

# Get population data 
download.file("https://data.humdata.org/dataset/c686be28-1858-44e9-a44b-598ce32ca20b/resource/85ba6660-fad2-4e49-b9b3-9ecb754d5700/download/sen_admpop_adm1_2020.csv", 
              destfile = "sen_admpop_adm1_2020.csv", mode ="wb")
reg_pop = read.csv("sen_admpop_adm1_2020.csv")

# Clean data : missing accent on the "e"
reg_pop = reg_pop %>%
    mutate(ADM1_FR = str_replace_all(ADM1_FR, c("Thies" = "Thiès", "Kedougou" = "Kédougou", "Sedhiou" = "Sédhiou")))

map(reg_pop, 
    subRegion = "ADM1_FR", 
    value = "Total", 
    legendType = "pretty", 
    title = "Population per region in Senegal")

Note that the region name of three regions is misspelt (Thiès, Kédougou, Sédhiou) in the downloaded data with missing acute and grave accents, although the ADM1_FR column should contain the French names of the regions (and hence with accents).

map_param_PRETTY

Example 2 : Covid-19 cases per African country

# Get covid-19 data
download.file("https://data.humdata.org/dataset/9151988e-ae09-416c-9408-6df93bebf3a2/resource/ab898c39-2322-4bf0-a9a7-5897f5605bc1/download/africa_covid19_daily_deaths_national.csv",
              destfile = "africa_cov_cases.csv", mode ="wb")
cases_africa = read.csv("africa_cov_cases.csv", sep = ";")  

# Clean data : aggregate cases, remove empty rows and correct region names
cases_per_country = cases_africa %>% 
    mutate(sum = rowSums(cases_africa[,4:length(cases_africa)], na.rm = T)) %>%
    subset(ISO != "") %>%
    mutate(COUNTRY_NAME = str_replace_all(COUNTRY_NAME, c("Sao Tome and Principe" = "São Tomé and Principe", "Eswatini" = "eSwatini")))

map(cases_per_country, 
    subRegion = "COUNTRY_NAME", 
    value = "sum", 
    title = "Covid-19 cases per African country")

Note here that the names of eSwatini and São Tomé and Principe were misspelt in the downloaded data (respectively Eswatini and Sao Tome and Principe).

map_param_KMEANS

Example 3 : Infectious diseases in California

Plot different years

# Get infectious disease data 
download.file("https://data.chhs.ca.gov/dataset/03e61434-7db8-4a53-a3e2-1d4d36d6848d/resource/75019f89-b349-4d5e-825d-8b5960fc028c/download/odp_idb_2020_ddg_compliant.csv", 
              destfile = "infect_disease_us_county.csv", mode ="wb")

i_disease = read.csv("infect_disease_us_county.csv") ; colnames(i_disease)

# Clean data : keep years 2018-2020 and cases of Malaria, Typhus, Dengue and Lyme diseases
i_4_disease <- i_disease %>% 
    subset(grepl("Malaria|Typhus|Dengue|Lyme", Disease) & grepl("Total", Sex) & grepl("2018|2019|2020", Year)) %>% 
    filter(County != "California")  %>% # csv contains a row with sum of cases for all counties in California 
    mutate(County = paste0(County, "_CA")) # add proper state abbreviation to match built-in map counties 

malaria_map = map(i_4_disease[i_4_disease == "Malaria", ], 
                  subRegion = "County", 
                  value = "Cases", 
                  save = F, 
                  x = "Year", 
                  title = "Malaria cases")

malaria_map$`map_param_KMEANS`

map_param_KMEANS

Plot different diseases

disease_map = map(i_4_disease[i_4_disease$Year == "2020", ], 
                  subRegion = "County", 
                  value = "Cases", 
                  save = F, 
                  class = "Disease", 
                  ncol = 4, 
                  title = "Cases of several infectious diseases")

disease_map$`map_param_KMEANS`

map_param_KMEANS

Compare across years

malaria_map_diff_2018 = map(i_4_disease[i_4_disease == "Malaria", ], 
                            subRegion = "County", 
                            value = "Cases", 
                            save = F, 
                            xRef = "2018", 
                            underLayer = mapUS49County, 
                            title = "Difference with 2018 malaria cases")

malaria_map_diff_2018$map_param_KMEANS_xDiffAbs
malaria_map_diff_2018$map_param_KMEANS_xDiffPrcnt

map_param_KMEANS_xDiffAbs map_param_KMEANS_xDiffPrcnt

For some reason, malaria_map_diff_2018$map_param_KMEANS_xDiffPrcnt does not display values for some counties. I think this might be because extreme values of the legend classes are not comprised in any classes ? (e.g. Counties where exactly 0% change occurs are not represented in any class). I think this issue needs to be resolved. Adding square brackets around the extreme values of the classes could help specify in which class the extreme values are included.

After retesting the package, from my point of view, I think it is the responsibility of the user to make sure its data are complete, up-to-date and correct (e.g. make sure all Region names are properly written) and match the “subregions” of the rmap built-in maps. In any case, if the user had to use another solution than rmap, and join its data for example with the attribute table of an existing shapefile, he/she would still need to clean its data beforehand. Following @maczokni comments, maybe a section of the user guide could highlight the need for cleaning data and matching them with the built-in maps subRegions?

As for the novelty of the package, as raised by @maczokni, I think its true novelty lies indeed in the ability to easily compare across year/scenarios/categories. On another note, compared to existing packages like rnaturalearth and tigris, everything can be done here using only the package rmap, while the two former solutions require combining it with other packages such as tmap. Another advantage of rmap (that is maybe not highlighted enough in the paper) is that it creates ggplot objects which can easily be combined with others to change the map layout, for example, and I think ggplot might be more known among non-GIS users than tmap.

Except for the issue with the legend classes, I think rmap worked as expected in the examples above. Let me know what you think @hugoledoux and @maczokni.

hugoledoux commented 2 years ago

okay, I re-read the (very detailed and good) comments from the reviewers (thanks @CamilleMorlighem and @maczokni) and I think we can go ahead and try to finalise the submission.

I would like the 3 points raised above to be fixed:

add a section about how to clean the data and match them with the maps subregions; this doesn't need to be in the paper but in the docs it would be necessary I believe, both reviewers struggle with this
extreme values not being displayed for some datasets: can you look into that?
in the "Statement of Need", perhaps add what @CamilleMorlighem mentions about ggplot objects?

Once this is done, I am happy to process further with acceptance of the submission

zarrarkhan commented 2 years ago

@hugoledoux @CamilleMorlighem @maczokni Thank you all for the great feedback and comments. Yes, we will address all these points and update the submission. Best Zarrar

zarrarkhan commented 2 years ago

@hugoledoux

Please see each of the points mentioned above addressed in the latest version of the paper.

add a section about how to clean the data and match them with the maps subregions; this doesn't need to be in the paper but in the docs it would be necessary I believe, both reviewers struggle with this

Section on Cleaning Data added to user guide explaining how subRegions will need to be matched to built-in map names along with an example of how rmap automatically identifies the missing regions for you and how you can then go in and find the correct spelling.

extreme values not being displayed for some datasets: can you look into that?

@CamilleMorlighem Thanks for catching this. The issue here actually was that the % difference calculation uses the base year value to calculate the % difference i.e. Percent Diff = (Value in Year_i - Value in Year_Ref)*100/(Value in Year_Ref). In the examples you provided or in general when there was a 0 value in the Ref year the result was coming out to be Infinite ('Inf'). This was then resulting in an NaN value being filtered out and thus those counties being removed. We have fixed this to behave better as follows:

If there is a '0' value in the Ref year rmap gives a warning saying that there is a '0' value and the difference calculation will give an Inf value which is converted to an NA.
rmap comes with the option of highlight single values such as 0s as well as showing NA values (listed in separate sections of the user guide: Legend Single Value and Show NA
We switched the default showNA value to T so it shows these.

If we redo your example with legendSingleValue = 0 and showNA=0 you get the following. This time rmap gives the warning about the Inf values and we can see "NAs" vs. '0' values in the maps.

library(stringr)
library(dplyr)
library(rmap)

# Get infectious disease data
download.file("https://data.chhs.ca.gov/dataset/03e61434-7db8-4a53-a3e2-1d4d36d6848d/resource/75019f89-b349-4d5e-825d-8b5960fc028c/download/odp_idb_2020_ddg_compliant.csv",
              destfile = "infect_disease_us_county.csv", mode ="wb")

i_disease = read.csv("infect_disease_us_county.csv") ; colnames(i_disease)

# Clean data : keep years 2018-2020 and cases of Malaria, Typhus, Dengue and Lyme diseases
i_4_disease <- i_disease %>%
  subset(grepl("Malaria|Typhus|Dengue|Lyme", Disease) & grepl("Total", Sex) & grepl("2018|2019|2020", Year)) %>%
  filter(County != "California")  %>% # csv contains a row with sum of cases for all counties in California
  mutate(County = paste0(County, "_CA")) # add proper state abbreviation to match built-in map counties

malaria_map_diff_2018 = map(i_4_disease[i_4_disease == "Malaria", ],
                            subRegion = "County",
                            value = "Cases",
                            save = F, show=F,
                            xRef = "2018",
                            underLayer = mapUS49County,
                            labels=T, legendSingleValue = 0)

malaria_map_diff_2018$map_param_KMEANS
malaria_map_diff_2018$map_param_KMEANS_xDiffAbs
malaria_map_diff_2018$map_param_KMEANS_xDiffPrcnt

map_param_KMEANS map_param_KMEANS_xDiffAbs map_param_KMEANS_xDiffPrcnt

in the "Statement of Need", perhaps add what @CamilleMorlighem mentions about ggplot objects?

We mention the point about ggplot objects in the Statement of Needs as follows:

"Post-process customization: Existing packages do not produce output objects that can be saved and then customized. Customization of the maps in existing packages is limited to package specific functionality and arguments. rmap produces ggplot objects in which every element (axis, grids, titles, colors, line widths, facets) can all be customized after the map has been produced. This allows users to capitalize on existing knowledge of the widely used ggplot2 package and its arguments."

hugoledoux commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left: