Closed whedon closed 2 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @CamilleMorlighem, @maczokni it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Wordcount for paper.md
is 1074
Software report (experimental):
github.com/AlDanial/cloc v 1.88 T=0.13 s (246.4 files/s, 106346.4 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
R 19 1404 2413 7675
Rmd 1 367 512 781
Markdown 6 62 0 200
TeX 1 18 0 189
YAML 5 33 0 159
-------------------------------------------------------------------------------
SUM: 32 1884 2925 9004
-------------------------------------------------------------------------------
Statistical information for the repository '8ee264c2b9dd4723350c4830' was
gathered on 2021/12/20.
The following historical commit information, by author, was found:
Author Commits Insertions Deletions % of changes
Khan 2 442 442 100.00
Below are the number of rows from each author that have survived and are still
intact in the current revision:
Author Rows Stability Age % in comments
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1007/s10113-021-01775-1 is OK
- 10.1088/1748-9326/ac046c is OK
- 10.21105/joss.00054 is OK
- 10.32614/RJ-2011-006 is OK
- 10.1007/b106573 is OK
- 10.5194/gmd-12-677-2019 is OK
- 10.5334/jors.292 is OK
MISSING DOIs
- 10.18637/jss.v084.i06 may be a valid DOI for title: tmap: Thematic Maps in R
- 10.32614/rj-2018-009 may be a valid DOI for title: Simple features for R: standardized support for spatial vector data.
INVALID DOIs
- 0.1029/2020EF001970 is INVALID
Hello ! I started the review. I have a few comments already:
Hi @CamilleMorlighem,
Thanks for the very useful feedback and comments. Here are the responses to your points:
Hi @zarrarkhan,
Thank you for your responses !
I have tested the documentation and functionality of the package. The documentation is very clear and structured and very user friendly! I just have the following comments for improvement:
mapGCAMBasins
and mapGCAMReg32
is not working for me (I reach an error page). unique(shapeSubset@data$subRegion)
and not unique(shapeSubset@data$states)
rgdal::readOGR(dsn=exampleFolder,layer=exampleFile,use_iconv=T,encoding='UTF-8')
instead of rgdal::readOGR(dsn=examplePolyFolder,layer=examplePolyFile,use_iconv=T,encoding='UTF-8')
. mapx$map_param_xDiffAbs_2015_1990_KMEANS
mapx$map_param_xDiffPrcnt_2015_1990_KMEANS
mapx$map_param_DiffAbs_scenario3_scenario1_KMEANS
mapx$map_param_DiffPrcnt_scenario3_scenario1_KMEANS
About the functionality, everything worked fine and I don't have any performance claims. I only have the following comments:
scaleRangeDiffAbs
(or scaleRangeDiffPrcnt
) in the map
function . I could not see what changed in the results when changing the values for these arguments or when simply removing them. map_find_df
function to find a suitable map for your dataframe. I think it might be very useful when you don't know the package and want to use it for a specific purpose, so maybe it is worth highlighting this function in the user-guide? Feel free to ignore any comment if you think that's not relevant!
:wave: @maczokni, please update us on how your review is going (this is an automated reminder).
:wave: @CamilleMorlighem, please update us on how your review is going (this is an automated reminder).
@CamilleMorlighem Thanks for the detailed review and very helpful comments. Addressed as follows:
Improvements
Functionality
Hello! I've hit a bit of a stumbling block when trying to follow along with some data that isn't the one in the examples (I understand the idea to be that users can upload their own .csv files and this package will allow them to easily map this?)
First note: if the column is not called "subRegion" then the error Error in rmap::map_find_df(mydata) : data must have subRegion columns
is immediately thrown. Instead of requiring users to rename their column, could this not be a parameter to specify in the function itself? E.g.: map_find_df(data = mydata, regions = subRegion). I don't think it's in line with the contribution to make things easy to ask people to rename their columns!
But where I'm stuck: once I try to map my data, it will not do this. I have renamed to values, and have checked and it is numeric. Here's my code
library(dplyr)
library(geodaData)
library(rmap)
# get own data to try
ncovr <- geodaData::ncovr
mydata <- ncovr %>% select(NAME, HR60)
map_chosen <- rmap::map_find_df(mydata)
# Error in rmap::map_find_df(mydata) : data must have subRegion columns.
mydata <- ncovr %>% select(NAME, HR60) %>% rename(subRegion = NAME)
map_chosen <- rmap::map_find_df(mydata)
rmap::map(map_chosen)
# Here the value column is not used, and no error is thrown, I just get a meaningless map
mydata <- ncovr %>% select(NAME, HR60) %>% rename(subRegion = NAME, value = HR60)
map_chosen <- rmap::map_find_df(mydata)
rmap::map(map_chosen)
# OK it might not be an issue with naming, it just doesn't work...
rmap::map(map_chosen, fillColumn = value)
# Error in rmap::map_plot(color = color, lwd = lwd, legendType = legendType, :
# object 'value' not found
# the example from the vignette works
data = data.frame(subRegion = c("CA","FL","ID","MO","TX","WY"),
value = c(5,10,15,34,2,7))
rmap::map(data)
# both the example and my own data's value columns contain numeric variables
class(mydata$value)
class(data$value)
# From the lack of error messages and the lack of further detail in the vignette, I cannot get it to work.
Any thoughts?
Hi @maczokni.
Thanks for your review and comments. Please see our responses below:
# Update rmap with latest changes
devtools::install_github("JGCRI/rmap")
library(rmap)
Required Column names (subRegion and Value): We have highlighted how users can identify their own column names in rmap using the subRegCol and valueCol arguments. (see end of this section: https://jgcri.github.io/rmap/articles/vignette_map.html#inputs)
Example data: Thanks for trying rmap with your example data. So we updated a few things and have some comments on this:
ncovr <- geodaData::ncov
gives an sf object with geometries. rmaps was intially unable to process this because it was not a pure R dataframe. Code has been updated to convert an sf object to a pure dataframe.mydata <- ncovr %>% select(NAME, HR60)
results in a dataframe with a list of counties and corresponding data. If you sort by NAME you will see multiple entries for the same subRegion. rmap
(or any other mapping package for that matter) would not know how to deal with this repeated data.
dplyr::select(NAME,HR60) %>%
as.data.frame() %>%
dplyr::arrange(NAME) %>%
head(10)
With the updates you can run your example as follows: Note that in general, county data is problematic because there are multiple counties with the same name in different states. In order to be more specific you can append your county names with an underscore and relevant State initials.
library(dplyr)
library(geodaData)
library(devtools)
library(rmap)
ncovr <- geodaData::ncovr
mydata <- ncovr %>% dplyr::select(NAME,HR60) %>% head(10); mydata
rmap::map(mydata, subRegCol = "NAME", # Set your subRegion column valueCol = "HR60") # Set your value column
rmap::map(mydata, subRegCol="NAME", valueCol="HR60", labels = T, labelSize = 3, labelRepel = T, underLayer = rmap::mapUS49, zoom=-2)
Please let us know if this helps clarify some of the issues you were having and if you have additional concerns/comments.
Hi @zarrarkhan thanks for the update - sorry just getting back to this now but I have a major concern - you are right there are duplicates, as there are USA counties which share the same name. This means that the county names is not actually an appropriate column on which to join. This means that if I do want to join the full dataset (rather than the 10 subset you used) then I cannot do this with the package - is that right? And this is not just a quirk of the dataset I'm using to test, your own data (the inbuilt map: mapUS49County has duplicate values of the subRegion column):
# devtools::install_github("JGCRI/rmap")
library(rmap)
library(geodaData)
library(dplyr)
library(sf)
# get data for testing
data("ncovr")
# check any duplicate names in the county inbuilt map
thing <- rmap::mapUS49County
length(thing@data$subRegion)
length(unique(thing@data$subRegion))
as.data.frame(table(thing@data$subRegion)) %>% filter(Freq > 1)
Considering this is a primary function of the package, do you think you could take some more time to think about this problem? As I'm sure you are familiar, usually for each geography you will look for unique codes in the dataframes on which you can confidently join. Here in the UK we might use for example local authority codes or output area codes. In the US it seems to be state and county codes for example in this case, which is in the ncovr dataset as 'FIPS'.
I had a play with the data of the mapUS49County inbuilt map as an example, and by pasting the state and county code you can create a FIPS column to join.
# combine state code and county code for FIPS
thing <- rmap::mapUS49County
thing@data$FIPS <- paste0(thing@data$STATEFP, thing@data$COUNTYCODE)
joined <- left_join(ncovr, thing@data, by = c("FIPS" = "FIPS"))
# check how many would not be joined
nrow(joined %>% filter(is.na(source)))
# check these ones
not_joined <- joined %>% filter(is.na(source))
When we do that only 3 counties are not joined - i.e. only 3 would not be linked to the inbuilt map. You might want to consider printing a warning to users when this is the case, maybe with the name of the ones which did not join.
However this is only one case for one data set - I think it might be useful to think about some use cases of external data, how do these codes commonly look like, and have a few join columns on which users' data can be joined to the inbuilt geometries.
Hope this makes sense, let me know if not I can try to elaborate more!
@maczokni Thanks for your comments and notes. We have addressed your comments as follows:
subRegion subRegionAlt STATEFP COUNTYCODE
1 Baltimore_MD Baltimore 24 005
2 Baltimore_MD Baltimore 24 510
3 Fairfax_VA Fairfax 51 059
4 Fairfax_VA Fairfax 51 600
5 Franklin_VA Franklin 51 067
6 Franklin_VA Franklin 51 620
7 Richmond_VA Richmond 51 159
8 Richmond_VA Richmond 51 760
9 Roanoke_VA Roanoke 51 161
10 Roanoke_VA Roanoke 51 770
11 St. Louis_MO St. Louis 29 189
12 St. Louis_MO St. Louis 29 510
_city
so that we have completely unique subRegions as well as added FIPS as per your suggestion. Updated table outputs (selected columns) now show:
subRegion subRegionAlt STATEFP COUNTYCODE FIPS
1 Baltimore_MD Baltimore 24 005 24005
2 Baltimore_MD_city Baltimore 24 510 24510
3 Fairfax_VA Fairfax 51 059 51059
4 Fairfax_VA_city Fairfax 51 600 51600
5 Franklin_VA Franklin 51 067 51067
6 Franklin_VA_city Franklin 51 620 51620
7 Richmond_VA Richmond 51 159 51159
8 Richmond_VA_city Richmond 51 760 51760
9 Roanoke_VA Roanoke 51 161 51161
10 Roanoke_VA_city Roanoke 51 770 51770
11 St. Louis_MO St. Louis 29 189 29189
12 St. Louis_MO_city St. Louis 29 510 29510
ncovr
dataset
The not_joined
dataframe in your example above refers to 3 counties which are out of date in the ncovr
dataset. These three subRegions from the not_joined
datatable are shown below. References to when their names were changed are listed after the code chunk:
not_joined %>% dplyr::select(NAME,STATE_NAME,STATE_FIPS,CNTY_FIPS,FIPS)
NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS
1 Shannon South Dakota 46 113 46113
2 Clifton Forge Virginia 51 560 51560
3 Dade Florida 12 025 12025
(Source: https://www.ddorn.net/data/FIPS_County_Code_Changes.pdf)
# Checking the source of rmap maps
rmap::mapUS49County@data$source%>%unique()
# "https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html"
rmap::mapCountries@data$source%>%unique()
# "https://www.naturalearthdata.com/downloads/"
Please let us know if this addresses some of your concerns or if you have additional comments.
@CamilleMorlighem I see that the authors fixed your issues (statement of needs among others, see there: https://github.com/openjournals/joss-reviews/issues/4015#issuecomment-1004272488) but you didn't tick the boxes above. Can you let us know if you are now satisfied please?
@hugoledoux thank you for noticing it !
@zarrarkhan thank you for these improvements and answers to my comments. I think I'm done with the review. Rmap is a very nice package! I'm probably gonna use it in the future.
@hugoledoux Thanks for following up. Wondering if there are any additional comments we can address or get an update from the reviewers on the comments we addressed in https://github.com/openjournals/joss-reviews/issues/4015#issuecomment-1019419000.
@maczokni could you check the updates from the authors (https://github.com/openjournals/joss-reviews/issues/4015#issuecomment-1019419000)? And tell us if you're satisfied?
My apologies for the delay. At this moment, I'm afraid that I'm still not clear what the instructions then are for joining data. My understanding is that the USP of this package is that it provides an easy way to map things for someone who is not super familiar with GIS. This would require some work on the part of the package to have a set of values on which it could join geometries and do some searching in place of the sure. One commonly used for the USA for example is FIPS. The authors suggest "The datasets also contain additional columns such as FIPS and other details that were part of the original datasets to ensure unique entries." - however you cannot join using FIPS, or if you can I don't see this detailed on the documentation. If I try to specify the FIPS column as the "subRegCol", it does not work:
# devtools::install_github("JGCRI/rmap")
library(rmap)
library(geodaData)
library(dplyr)
library(sf)
# get data for testing
data("ncovr")
mydata <- ncovr %>% select(NAME, STATE_NAME, FIPS, HR60) %>% st_drop_geometry()
map_chosen <- rmap::map(mydata, subRegCol = 'FIPS', valueCol = 'HR60')
[1] "Starting map..."
[1] "Map data table written to /cloud/project/outputs//map_param.csv"
Error in get(paste(gsub("subReg", "map", gsub("Alt", "", subRegChosen)), :
object 'mapGCAMLanddf' not found
The authors suggest: " Each of our datasets provides the source for the underlying data which can be compared to user data if needed. For example the county data is from "https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html"." and that "Some degree of data cleaning and checks is expected with user data."
For me to join the ncovr data (just one example) I have to get a lookup table from state name to state abbreviation, join it to my data set, then create a new column which has the same strings as the subRegCol expects, and then I can request a map, however I still run into an error:
state_lookup <- data.frame(state_name = c("Alabama","Alaska","Arizona","Arkansas","California","Colorado","Connecticut","Delaware",
"Florida","Georgia","Hawaii","Idaho","Illinois","Indiana","Iowa","Kansas","Kentucky","Louisiana",
"Maine","Maryland","Massachusetts", "Michigan","Minnesota","Mississippi","Missouri","Montana",
"Nebraska","Nevada","New Hampshire", "New Jersey","New Mexico","New York","North Carolina",
"North Dakota","Ohio","Oklahoma","Oregon","Pennsylvania","Rhode Island","South Carolina","
South Dakota","Tennessee","Texas","Utah","Vermont","Virginia","Washington","West Virginia",
"Wisconsin","Wyoming"),
state_code = c("AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN","IA","KS","KY",
"LA","ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ","NM","NY","NC","ND","OH","OK",
"OR","PA","RI","SC","SD","TN","TX","UT","VT","VA","WA","WV","WI","WY"))
mapthis <- dplyr::left_join(mydata, state_lookup, by = c("STATE_NAME" = "state_name"))
mapthis$subRegCol <- paste0(mapthis$NAME, "_",mapthis$state_code)
map_chosen <- rmap::map(mapthis, subRegCol = 'subRegCol')
# [1] "Starting map..."
# Coordinate system already present. Adding new coordinate system, which will replace the existing one.
# Error in `check_required_aesthetics()`:
# ! geom_polygon requires the following missing aesthetics: x and y
# Run `rlang::last_error()` to see where the error occurred.
# There were 32 warnings (use warnings() to see them)
# Warning messages:
# 1: Unknown or uninitialised column: `lon`.
# 2: In min(datax1$lon) : no non-missing arguments to min; returning Inf
# 3: Unknown or uninitialised column: `lon`.
# 4: In min(x, na.rm = na.rm) : no non-missing arguments to min; returning Inf
# 5: In max(x, na.rm = na.rm) : no non-missing arguments to max; returning -Inf
# ...etc...
So this is where my sticking point is, that if the USP is that this is so much easier than getting your own shapefile, this process should be more smooth. Now maybe I'm missunderstanding, and in that case I'm happy to hear different, but at the moment, but if the advice is that the user has to go check out the source data anyway, I don't see why not to simply get the shapefile from the US Census bureau and use tmap to make a quick map.
EDIT
Actually the tigris package makes this even easier:
tg_counties <- tigris::counties()
mapthis <- dplyr::left_join(tg_counties, mydata, by = c("GEOID" = "FIPS"))
tmap::qtm(mapthis, fill = "HR60")
I understand there is further functionality, and I am sure there is more added value there, but at the moment just struggling to get to that point...! The solutions like selecting only 10 of the 3000+ counties required aren't really useful as they aren't realistic use cases.
Happy to be pointed out that I'm missing something super obvious, but just really would like to get a working example with a real world data set before I can sign off on this.
@maczokni
Thank you for your response. Here is a working example with your dataset after removing out-of-date FIPS as we discussed above. I tried your counter-examples and ran into some issues for which I have shared more details below.
Working Example with your data
data("ncovr")
mydata <- ncovr %>% select(NAME, STATE_NAME, FIPS, HR60) %>% st_drop_geometry() %>%
dplyr::left_join(rmap::mapUS52County@data, by="FIPS") %>%
dplyr::mutate(value=HR60) %>%
dplyr::filter(!is.na(subRegion)) # Remove non-existent FIPS 51560 (Clifton Forge), FIPS 12025 (Dade)
rmap::map(data=mydata)
Missing FIPS in ncovr data You are using a specific dataset "ncovr" which has FIPS for counties that do not exist any more as we described in #4015: (Source: https://www.ddorn.net/data/FIPS_County_Code_Changes.pdf)
Your Counter Example 1
download.file("https://www2.census.gov/geo/tiger/GENZ2018/kml/cb_2018_us_county_20m.zip","counties.zip")
unzip("counties.zip")
counties_geojson <- sf::st_read("cb_2018_us_county_20m.kml")
mapthis <- dplyr::left_join(counties_geojson, mydata, by = c("GEOID" = "FIPS"))
tmap::qtm(mapthis, fill = "HR60")
Error: Join columns must be present in data.
x Problem with `GEOID`.
Your Counter Example 2
tg_counties <- tigris::counties()
mapthis <- dplyr::left_join(tg_counties, mydata, by = c("GEOID" = "FIPS"))
tmap::qtm(mapthis, fill = "HR60")
rmap
produces the map for this data almost instantly and does not require any downloadsUpdates made and additional comments
Working Example with your data after updates
# devtools::install_github("JGCRI/rmap") # To update to latest version
library(rmap)
library(geodaData)
library(dplyr)
library(sf)
data("ncovr")
mydata <- ncovr %>% select(NAME, STATE_NAME, FIPS, HR60) %>% st_drop_geometry() %>%
dplyr::left_join(rmap::mapUS52County@data, by="FIPS") %>%
dplyr::mutate(value=HR60)
rmap::map(data=mydata)
Please let us know if the example above is working for you and if you have any additional comments or issues.
Hiya I'll give this a go later thanks but it's still just one example of the counties and it still requires the user to do data cleaning, so still not sure about the target audience being non GIS users.
However in the meantime I wanted to share two concerns with the code:
1 - the package functions producing character vectors are created using print()
rather than messages using message()
? It's going to be an issue for users because it defeats things like suppressMessages() and knitr chunk options when the package is used in a Rmarkdown document. It would be even better if you could use rlang::inform()
because it has nice formatting for their long messages while still playing nicely with functions that expect base::message()
.
2 - you should consider adopting sf for the internal workings rather than sp. Roger Bivand (who wrote the sp package in the first place) recommends at https://doi.org/ghnwg3 that new packages use sf rather than sp and says that packages that continue to depend on SP will have to "hope" it continues to be maintained. I also note that rmap
depends on rgdal
and rgeos
, both of which will be retired by the end of 2023 (as noted in bold type at the top of the NEWS.md pages of both packages) and neither of which are required if you use sf instead.
@maczokni
Thank you for the helpful comments and suggestions. We have addressed these in commit da9c41574820679fae9e0cb4871cf014edf2213c with responses below:
rmap
provides extensive examples for users to build maps of common geographical boundaries (countries, states, basins, counties etc.) with the user requiring a basic table with a minimum of two columns ("subRegion" and "value") as you have seen. This still means the user has to prepare their datasets into those basic tables and may run into incompatibility issues if using region names which don't exist in rmap's databases.rlang::inform
rmap
still depends on sp
as it builds all of its maps from the SpatialPolygonDataFrame class. We agree that this should be updated to sf
and will do so in v2.0 of the model as the package will need extensive testing for these replacements. I'm really not trying to be difficult, but I believe the package requires extensive testing anyway. I just tried again with a different data set to map Local Authority areas in London and this is the result:
Here's the code:
library(rmap)
library(tidyverse)
library(readxl)
library(janitor)
# read in new test data
download.file("https://data.london.gov.uk/download/gcse-results-by-borough/a8a71d73-cc48-4b30-9eb5-c5f605bc845c/gcse-results.xlsx",
destfile = "gcse-results.xlsx")
gcse_results <- read_xlsx("gcse-results.xlsx", sheet = "2020-21")
# clean up test data
colnames <- paste0(gcse_results[1,], gcse_results[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results) <- colnames
gcse_results <- gcse_results %>%
clean_names() %>%
slice(4:36) %>%
mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))
# try to map using Local Authority name
my_map <- rmap::map(gcse_results,
subRegCol = "area",
valueCol = "number_of_pupils_at_the_end_of_key_stage_4")
Again I know you can manually find a better fitting map, but then once again I'm not certain the contribution over finding a map manually in other ropensci packages like rnaturalearth and using tmap:
library(rnaturalearth)
library(tmap)
london_boroughs <- ne_states(country = "United Kingdom", returnclass = "sf") %>%
filter(region == "Greater London")
gcse_results_sf <- left_join(london_boroughs, gcse_results,
by = c("name" = "area"))
qtm(gcse_results_sf, fill = "number_of_pupils_at_the_end_of_key_stage_4")
which returns:
Or if I want the legend fixed:
qtm(gcse_results_sf, fill = "number_of_pupils_at_the_end_of_key_stage_4", fill.title="Pupils at KS4")
In sum: I think that there is some more extensitve and thorough testing required than something that should be done by reviewers. I'm just spot checking here and keep coming across issues - so at this point I would advise some more thorough testing of the package overall by the authors, before I would be happy to sign off on this.
@maczokni of course you are not being difficult and on the contrary we appreciate you taking the time and effort to give your detailed feedback. We will look at the issues you have pointed out and a more through test of the existing maps and get back to you.
@maczokni @hugoledoux
Thank you for your earlier comments. We have tried to address all of your comments and made the following changes. Please, pull in the latest changes from main
to test these and let us know if you have any additional comments.
sp
package and switched to sf
. All rmap
internal data maps are now sf
objectsrmap
functions are now at >85% CI test coverage with over all package test coverage improved from 55% to 65%. (map.R 60.84%, map_plot.R 86%, map_find.R 92%, printPdfPng.R 91%) Load Libraries
library(rmap); library(geodaData); library(dplyr); library(sf); library(tidyverse)
library(readxl); library(janitor)
Example 1
data("ncovr")
mydata <- ncovr %>%
dplyr::select(NAME, STATE_NAME, FIPS, HR60) %>%
sf::st_drop_geometry() %>%
dplyr::left_join(rmap::mapUS52County, by="FIPS")
rmap::map(data=mydata, valueCol = "HR60", legendTitle = "HR60")
Example 2
download.file("https://data.london.gov.uk/download/gcse-results-by-borough/a8a71d73-cc48-4b30-9eb5-c5f605bc845c/gcse-results.xlsx", destfile = "gcse-results.xlsx")
gcse_results <- readxl::read_xlsx("gcse-results.xlsx", sheet = "2020-21")
# clean up test data
colnames <- paste0(gcse_results[1,], gcse_results[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results) <- colnames
gcse_results <- gcse_results %>%
clean_names() %>%
slice(4:36) %>%
mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))
# Plot
my_map <- rmap::map(gcse_results,
subRegCol = "area",
valueCol = "number_of_pupils_at_the_end_of_key_stage_4",
legendTitle = "Pupils at KS4")
Example 3
# Test Covid data
# Our World in Data JHU https://github.com/owid/covid-19-data/tree/master/public/data
# State vaccination data: https://github.com/owid/covid-19-data/raw/master/public/data/vaccinations/us_state_vaccinations.csv
covid_data <- read.csv(url("https://github.com/owid/covid-19-data/raw/master/public/data/vaccinations/us_state_vaccinations.csv"))%>%
tibble::as_tibble() %>%
dplyr::select(subRegion=location,date,value=people_vaccinated_per_hundred) %>%
dplyr::mutate(subRegion = dplyr::if_else(subRegion=="New York State","New York",subRegion)) %>%
dplyr::filter(date == max(date),
subRegion %in% rmap::mapUS49$subRegionAlt); covid_data
rmap::map(covid_data,
title=paste0("People Vaccinated per hundered ",max(covid_data$date)),
legendTitle = "People per 100")
Examples from Documentation: https://jgcri.github.io/rmap/articles/vignette_map.html
Hi @hugoledoux,
Just wanted to check in to see if there were any updates since we submitted our modifications in https://github.com/openjournals/joss-reviews/issues/4015#issuecomment-1074078464 two weeks ago.
Thanks!
@maczokni could you please have one look at the updated material here?
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@hugoledoux Thank you for generating the proof. We have read the proof and corrected a few typos we found.
In addition we see that the referencing style seems inconsistent but it is not clear how to adjust this. For example: In the article proof linked above on line 22 we have the following:
publications (Khan et al., 2021; Wild, Khan, Clarke, et al., 2021; Wild, Khan, Zhao, et al., 2021)
Which refer to the following references:
The first citation only references a single author while the rest list three authors.
Apart from this we have checked the proof and pushed all changes in commit: https://github.com/JGCRI/rmap/commit/6730a68880e6673584d67b45e66820ee1fb86d9a
Please let us know what we can do next.
Thanks!
Hi @hugoledoux, Just checking in to see if we can provide any more information or help move things ahead with the review. Thanks for your management of the process! Best Zarrar
Sorry @zarrarkhan I am on holidays. I will be back in 2w.
I was hoping that @maczokni would finalise her review in the meantime, if we haven't heard from her then I'll see how to proceed.
Hope you understand.
@hugoledoux No worries. That sounds good. Enjoy the vacations. Best Zarrar
Hi apologies for delay, I will try to get to this this week.
Hi, I'm having trouble downloading the new version:
> devtools::install_github("JGCRI/rmap")
Downloading GitHub repo JGCRI/rmap@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet, :
download from 'https://api.github.com/repos/JGCRI/rmap/tarball/HEAD' failed
I also tried with remotes instead of devtools but no luck. I can download other repos no problem with the same method (e.g. devtools::install_github("mpjashby/sfhotspot")
downloads the package no problem).
Here's my session info:
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.3.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] rstudioapi_0.13 magrittr_2.0.3 usethis_2.1.6 devtools_2.4.3 pkgload_1.2.4 R6_2.5.1 rlang_1.0.2
[8] fastmap_1.1.0 tools_4.2.0 pkgbuild_1.3.1 sessioninfo_1.2.2 cli_3.3.0 withr_2.5.0 ellipsis_0.3.2
[15] remotes_2.4.2 rprojroot_2.0.3 lifecycle_1.0.1 crayon_1.5.1 brio_1.1.3 processx_3.5.3 purrr_0.3.4
[22] callr_3.7.0 fs_1.5.2 ps_1.7.0 curl_4.3.2 testthat_3.1.4 memoise_2.0.1 glue_1.6.2
[29] cachem_1.0.6 compiler_4.2.0 desc_1.4.1 prettyunits_1.1.1
I suspect it will be to do with the package size and therefore time it takes to download. If I download via browser the tarball from the URL, the file is 120.9 (!!!) MB. Please see here about size and limitations for packages on cran: https://thecoatlessprofessor.com/programming/r/size-and-limitations-of-packages-on-cran/
I am happy to continue the review having installed from the package archive file I've downloaded, but it might be something to consider to reduce the size (again referring to the cran guidance).
@maczokni Thanks for this comment. We are aware of the size issue and mention it in our paper. We also had a previous version with the data separated from the package to meet CRAN guidelines. However, since all the functions in rmap depend on having access to the built-in maps it would require the map sets to be downloaded each time. Having all download during the intial install made this a one-time effort. We realize that making this choice means we won't be publishing rmap to CRAN. From the paper:
"While built-in maps increase the size of the package, having direct access to these allows for automated searching and quick deployment of relevant shapefiles without the need to download any data."
The total package size has not changed size since you first reviewed the package so it is strange that you are getting this download error now. We will check further to investigate these restrictions and can re-consider separating the data from the main package.
Best
Zarrar
Thanks for this! It might be that I initially downloaded at work with better internet access. I am not an expert in packaging data into packages, but is there not a solution to appeal to a database? The package sizes for rnaturalearth and rnaturalearthdata are both much smaller (<1mb and 3mb) and from what I understand perform roughly the same functionality of finding a relevant shapefile from a query? It might introduce a separate step I guess (e.g. find all which meet some higher-level category e.g. ask user to specify country first, get those relevant geometries, and then match on the relevant "subRegion" column?). Anyway like I said this is not my area of expertise, so I'm not sure what solution to suggest.
Another thing: when you summon the help file for the package it just links to the github repo and github pages. The link for the cheatsheet is broken (it says https://github.com/JGCRI/rmap/blob/main/rmapCheatsheet.pdf instead of https://jgcri.github.io/rmap/cheatsheet.pdf)
Overall, I remain unconvinced that this is easy for those who are new to GIS, or easier than existing solutions. I don't think (at least to me) it is clear enough from the documentation how to go about applying this to my own data, which I believe is meant to be the strength of this package. I say this after running into more issues, this time trying to evoke the automatic "Multi-Year-Class-Scenario-Facet". The example in the documentation produces a dataframe with repeat values in the subRegion column, but when I try with the gcse data we used above I get an error about this (presumably when the map is being summoned):
library(rmap)
library(tidyverse)
library(readxl)
library(janitor)
library(sf)
# read in new test data
download.file("https://data.london.gov.uk/download/gcse-results-by-borough/a8a71d73-cc48-4b30-9eb5-c5f605bc845c/gcse-results.xlsx",
destfile = "gcse-results.xlsx")
gcse_results <- read_xlsx("gcse-results.xlsx", sheet = "2020-21")
gcse_results_2 <- read_xlsx("gcse-results.xlsx", sheet = "2019-20")
# clean up test data
colnames <- paste0(gcse_results[1,], gcse_results[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results) <- colnames
gcse_results <- gcse_results %>%
clean_names() %>%
slice(4:36) %>%
mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))
colnames <- paste0(gcse_results_2[1,], gcse_results_2[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results_2) <- colnames
gcse_results_2 <- gcse_results_2 %>%
clean_names() %>%
slice(4:36) %>%
mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))
gcse_results_joined <- rbind(gcse_results %>% mutate(year = 2020),
gcse_results_2 %>% mutate(year = 2019))
rmap::map(data = gcse_results_joined,
subRegCol = "area",
valueCol = "number_of_pupils_at_the_end_of_key_stage_4",
ncol=4,
background = T)
Starting map...
Map data table written to /Users/user/Desktop/teaching/crime_mapping_discussions/outputs//map_param.csv
Error in rmap::map(data = gcse_results_joined, subRegCol = "area", valueCol = "number_of_pupils_at_the_end_of_key_stage_4", :
Input data data has multiple values. Please check your data.
I assumed we could get around this by extracting the map first, and using this. To do this try map_find() function. This function has not been changed like the rmap() function, and so still requires a renaming of the column to subRegion:
> map_find(gcse_results_joined, subRegCol = "area")
Error in map_find(gcse_results_joined, subRegCol = "area") :
unused argument (subRegCol = "area")
> map_find(gcse_results_joined)
Error in map_find(gcse_results_joined) : data must have subRegion columns.
But even after creating a "subRegion" column in my data, this map doesn't help me:
gcse_results_joined$subRegion <- gcse_results_joined$area
thing <- map_find(gcse_results_joined)
rmap::map(data = gcse_results_joined,
underLayer = thing
background = T)
Specifying the value column summons the same error:
> rmap::map(data = gcse_results_joined,
+ underLayer = thing,
+ background = T,
+ valueCol = "number_of_pupils_at_the_end_of_key_stage_4")
Starting map...
Map data table written to /Users/user/Desktop/teaching/crime_mapping_discussions/outputs//map_param.csv
Error in rmap::map(data = gcse_results_joined, underLayer = thing, background = T, :
Input data data has multiple values. Please check your data.
Various specifications didn't help. In the end the only solution was to rename my data in the image of the data specified in the example:
data = data.frame(subRegion = gcse_results_joined$area,
year = gcse_results_joined$year,
value = gcse_results_joined$number_of_pupils_at_the_end_of_key_stage_4)
rmap::map(data = data,
ncol=4,
background = T)
So I think I'm misunderstanding the process here, and I would say I'm pretty comfortable with GIS and R, having published, and actively teaching in this space, but feel quite lost following the documentation with my own data. It is possible that to someone from a different background these processes are more intuitive, perhaps there is a need for this in the domain of the authors, where people have a certain approach/ understanding which is required to use this package. In that case I think this should be specified in the documentation.
Basically the behaviour I would expect would be something like:
rmap::map(data = gcse_results_joined,
subRegion = area,
year = year,
value = number_of_pupils_at_the_end_of_key_stage_4,
ncol=4,
background = T)
My comment once again is that if the audience is people who have no GIS knowledge, and for some reason don't want to use rnaturalearth+tmap solutions to map their data, then:
1 - all functions should allow to specify where the relevant columns are in the own data (e.g. subRegion = "col", etc) 2 - the documentation should be improved with some clear instruction of how to apply the functions to an external data sets.
Thanks for the detailed comments. Please see responses below:
-Package Size: We did consider (and tested) hosting all the maps in a separate data repository. However this required the maps to be downloaded each time a function was called and for now we decided to provide all maps with the initial installation. rnaturalearth
comes with some pre-loaded maps (ne_countries, ne_states) but does also use ne_download for other maps and resolutions (https://www.rdocumentation.org/packages/rnaturalearth/versions/0.1.0/topics/ne_download)
-Help File and Links: We updated the help file with a short package description and also checked that all the links work.
-Your Example: Your example ran into a case which we had not come across in our tests yet i.e. both an x
column and a year
column present in the data. rmap
plots the time dimension using an x
column and if the data comes with a year
column it assumes this as the x
column. In your case, you had both an x
column and a year
column and so the x
column was used. Since the x
column was all NAs it lead to the duplicated rows message. We have updated the package so this situation is handled with an appropriate message to inform the user. We also updated the script so the user can specify the key arguments directly and renamed them to be more intuitive as you suggested (directly using subRegion
, value
, x
, instead of subRegCol
, valueCol
). We use x
in case the time dimension is something other than year, such as week or month etc. This should work now:
library(rmap)
library(tidyverse)
library(readxl)
library(janitor)
library(sf)
# read in new test data
download.file("https://data.london.gov.uk/download/gcse-results-by-borough/a8a71d73-cc48-4b30-9eb5-c5f605bc845c/gcse-results.xlsx",
destfile = "gcse-results.xlsx")
gcse_results <- read_xlsx("gcse-results.xlsx", sheet = "2020-21")
gcse_results_2 <- read_xlsx("gcse-results.xlsx", sheet = "2019-20")
# clean up test data
colnames <- paste0(gcse_results[1,], gcse_results[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results) <- colnames
gcse_results <- gcse_results %>%
clean_names() %>%
slice(4:36) %>%
mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))
colnames <- paste0(gcse_results_2[1,], gcse_results_2[2,])
colnames <- gsub("NA", "", colnames)
names(gcse_results_2) <- colnames
gcse_results_2 <- gcse_results_2 %>%
clean_names() %>%
slice(4:36) %>%
mutate(number_of_pupils_at_the_end_of_key_stage_4 = as.numeric(number_of_pupils_at_the_end_of_key_stage_4))
gcse_results_joined <- rbind(gcse_results %>% mutate(year = 2020),
gcse_results_2 %>% mutate(year = 2019))
rmap::map(data = gcse_results_joined,
subRegion = "area",
value = "number_of_pupils_at_the_end_of_key_stage_4")
-All functions should allow custom columns: We have adjusted the script so that users can specify their own columns for all the key arguments (value
, subRegion
, scenario
, class
, x
) for all functions. We add a section in the documentation on this: https://jgcri.github.io/rmap/articles/vignette_map.html#customcolumns
-Advantages of using rmap
: So the true utility of rmap
comes in comparing data across scenarios, classes and time. The examples below using your data demonstrate some of these capabilities which would not be so simple using existing software:
Multiple ggplots based on data provided saved as a list which can be edited later
rmap::map(data = gcse_results_joined,
subRegion = "area",
value = "number_of_pupils_at_the_end_of_key_stage_4") -> mapx
mapx$map_param_KMEANS
mapx$map_param_MEAN_KMEANS
mapx$map_param_MEAN_KMEANS + ggplot2::theme_dark()
Compare across time
rmap::map(data = gcse_results_joined,
subRegion = "area",
value = "number_of_pupils_at_the_end_of_key_stage_4",
xRef = 2019,
save =F, show =F) -> mapx
mapx$map_param_KMEANS_xDiffPrcnt
See the percentage change from 2019 to 2020
Compare Different Classes (In this case hypothetical genders)
gcse_results_joined_classes <- gcse_results_joined %>%
dplyr::mutate(gender = "girls") %>%
dplyr::bind_rows(gcse_results_joined %>%
dplyr::mutate(gender="boys",
number_of_pupils_at_the_end_of_key_stage_4 =
number_of_pupils_at_the_end_of_key_stage_4*runif(nrow(gcse_results_joined))))
rmap::map(data = gcse_results_joined_classes,
class = "gender",
subRegion = "area",
xRef = 2019,
value = "number_of_pupils_at_the_end_of_key_stage_4", save=F, show=F) -> mapx
mapx$map_param_KMEANS
mapx$map_param_KMEANS_xDiffPrcnt
See how the value changed for each gender between 2019 to 2020
Compare Scenarios and classes (In this case hypothetical income_classes)
gcse_results_joined_scenario <- gcse_results_joined %>%
dplyr::mutate(income_level = "high_income") %>%
dplyr::bind_rows(gcse_results_joined %>%
dplyr::mutate(income_level="low_income",
number_of_pupils_at_the_end_of_key_stage_4 =
number_of_pupils_at_the_end_of_key_stage_4*runif(nrow(gcse_results_joined))))
rmap::map(data = gcse_results_joined_scenario,
scenario = "income_level",
subRegion = "area",
xRef = 2019,
scenRef = "low_income",
value = "number_of_pupils_at_the_end_of_key_stage_4",
save=F,show=F) -> mapx
mapx$map_param_KMEANS
mapx$map_param_KMEANS_DiffPrcnt
See how the value changed for each gender and income class between 2019 and 2020
@hugoledoux @maczokni Hi, Just wanted to check in on any further feedback related to our response (https://github.com/openjournals/joss-reviews/issues/4015#issuecomment-1163520937) submitted 22 June. Thanks!
Hi All,
I read your response about the main contribution, but now I wonder if the paper needs updating then? You state in the comment above "So the true utility of rmap comes in comparing data across scenarios, classes and time" That is only 1 out of 4 points raised in the original paper. Would it be worth to focus the package and paper to this contribution?
If you must keep the table to map part, for the point "State of the field: Do the authors describe how this software compares to other commonly-used packages?" I would like to see discussion v rnaturalearth and other approaches to detail in what way this compares. I would like to be convinced in the paper that there is an added value here.
My other sticking point was "Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines" specifically the point: "enables some new research challenges to be addressed or makes addressing research challenges significantly better (e.g., faster, easier, simpler)." The data table to map being faster/easier/simpler was something I was not convinced about, but with the long back and forth we are at a point where I no longer want to be carrying out extensive testing on behalf of the authors. On the other hand if the key contribution is now this ease for the comparison around the different scenarios, this should be emphasised more.
At time of submission this package was not robustly tested enough. I've given a hefty amount of my time to testing it and making recommendations which I think were quite substantial. At this point I'm not willing to give more time to keep finding new test cases, but would like to be convinced that the authors themselves had gone through the trouble of doing this. Every time it fell over for me was a case where this could/should have been flagged by extensive internal testing. I would really like the authors to show they have done this themselves, instead of relying on the reviewers (actually I'm not sure the other reviewer did test this? so reviewer) to do so.
So to sum up: on the two outstanding points above in the paper (1 - State of the field, and 2 - Substantial scholarly effort) - if the authors make a case in updates to the paper that argue for these points, and demonstrate some significant testing, I will then tick these boxes and pass back to the editor @hugoledoux whether he is happy for it to go ahead.
thanks @maczokni for the update, this is clear and I agree with what you wrote.
Reviewers are expected indeed to test the package, but reviewers are indeed not debuggers. That being said, there are often bugs in software and one can't expect all software to be perfect.
@CamilleMorlighem a point was raised about testing, and since you did the review a while ago (before changes to the package were made) could you do them again perhaps?
@hugoledoux @maczokni
Thank you for your latest comments and the time you put into your thorough review. Your comments are addressed below and in the latest version of the paper draft: https://res.cloudinary.com/hkvhhndao/image/upload/v1657942383/ew2vgphg2dfqi6qy5n0t.pdf which can also be seen at: https://github.com/JGCRI/rmap/blob/main/paper/paper.md
Comment 1: If you must keep the table to map part, for the point "State of the field: Do the authors describe how this software compares to other commonly-used packages?" I would like to see discussion v rnaturalearth and other approaches to detail in what way this compares. I would like to be convinced in the paper that there is an added value here.
Response: The table to map
is one of several functionalities of rmap
that makes it easy to use. We agree that it is a minor improvement and so remove it from the statement of need section as its own paragraph. We leave discussion of this functionality in the paper in other sections.
Comment 2: My other sticking point was "Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines" specifically the point: "enables some new research challenges to be addressed or makes addressing research challenges significantly better (e.g., faster, easier, simpler)." The data table to map being faster/easier/simpler was something I was not convinced about, but with the long back and forth we are at a point where I no longer want to be carrying out extensive testing on behalf of the authors. On the other hand if the key contribution is now this ease for the comparison around the different scenarios, this should be emphasised more.
Response: Yes, one of the key contributions is the comparison around different scenarios and time periods. This is already emphasized in the paper as follows:
- In the title of the paper: "rmap: An R package to plot and compare tabular data on customizable maps across scenarios and time". - Abstract (lines 13 to 14): Additionally rmap automatically detects and produces comparison maps if the data has multiple scenarios or time periods as well as animations for time series data - Statement of Need (lines 54 to 59): Difference maps: Existing packages do not produce difference maps to compare across scenarios or time periods. rmap provides this functionality by automatically recognizing multiple scenarios and time periods to produce difference maps across these dimensions. An important aspect of spatial data is exploring the difference between two scenarios or time periods and rmap makes this a seamless process. - And as a Separate Section called Compare Scenarios (lines 84 -100): With example code and figures.
Spatial comparisons across scenarios and time are very valuable in multiple fields and this is one of the features of rmap
that has already been significantly used in the following published papers (and also in several additional papers in-progress and submission):
@hugoledoux can you please review our response and let us know if anything else is needed? Thanks Best Zarrar
Hi all,
Following comments from @hugoledoux and @maczokni, I have redone old tests of the package I did a while ago and new tests considering the updates of the package. See the following examples:
library(stringr)
library(dplyr)
library(rmap)
# Get population data
download.file("https://data.humdata.org/dataset/c686be28-1858-44e9-a44b-598ce32ca20b/resource/85ba6660-fad2-4e49-b9b3-9ecb754d5700/download/sen_admpop_adm1_2020.csv",
destfile = "sen_admpop_adm1_2020.csv", mode ="wb")
reg_pop = read.csv("sen_admpop_adm1_2020.csv")
# Clean data : missing accent on the "e"
reg_pop = reg_pop %>%
mutate(ADM1_FR = str_replace_all(ADM1_FR, c("Thies" = "Thiès", "Kedougou" = "Kédougou", "Sedhiou" = "Sédhiou")))
map(reg_pop,
subRegion = "ADM1_FR",
value = "Total",
legendType = "pretty",
title = "Population per region in Senegal")
Note that the region name of three regions is misspelt (Thiès, Kédougou, Sédhiou) in the downloaded data with missing acute and grave accents, although the ADM1_FR
column should contain the French names of the regions (and hence with accents).
# Get covid-19 data
download.file("https://data.humdata.org/dataset/9151988e-ae09-416c-9408-6df93bebf3a2/resource/ab898c39-2322-4bf0-a9a7-5897f5605bc1/download/africa_covid19_daily_deaths_national.csv",
destfile = "africa_cov_cases.csv", mode ="wb")
cases_africa = read.csv("africa_cov_cases.csv", sep = ";")
# Clean data : aggregate cases, remove empty rows and correct region names
cases_per_country = cases_africa %>%
mutate(sum = rowSums(cases_africa[,4:length(cases_africa)], na.rm = T)) %>%
subset(ISO != "") %>%
mutate(COUNTRY_NAME = str_replace_all(COUNTRY_NAME, c("Sao Tome and Principe" = "São Tomé and Principe", "Eswatini" = "eSwatini")))
map(cases_per_country,
subRegion = "COUNTRY_NAME",
value = "sum",
title = "Covid-19 cases per African country")
Note here that the names of eSwatini and São Tomé and Principe were misspelt in the downloaded data (respectively Eswatini and Sao Tome and Principe).
# Get infectious disease data
download.file("https://data.chhs.ca.gov/dataset/03e61434-7db8-4a53-a3e2-1d4d36d6848d/resource/75019f89-b349-4d5e-825d-8b5960fc028c/download/odp_idb_2020_ddg_compliant.csv",
destfile = "infect_disease_us_county.csv", mode ="wb")
i_disease = read.csv("infect_disease_us_county.csv") ; colnames(i_disease)
# Clean data : keep years 2018-2020 and cases of Malaria, Typhus, Dengue and Lyme diseases
i_4_disease <- i_disease %>%
subset(grepl("Malaria|Typhus|Dengue|Lyme", Disease) & grepl("Total", Sex) & grepl("2018|2019|2020", Year)) %>%
filter(County != "California") %>% # csv contains a row with sum of cases for all counties in California
mutate(County = paste0(County, "_CA")) # add proper state abbreviation to match built-in map counties
malaria_map = map(i_4_disease[i_4_disease == "Malaria", ],
subRegion = "County",
value = "Cases",
save = F,
x = "Year",
title = "Malaria cases")
malaria_map$`map_param_KMEANS`
disease_map = map(i_4_disease[i_4_disease$Year == "2020", ],
subRegion = "County",
value = "Cases",
save = F,
class = "Disease",
ncol = 4,
title = "Cases of several infectious diseases")
disease_map$`map_param_KMEANS`
malaria_map_diff_2018 = map(i_4_disease[i_4_disease == "Malaria", ],
subRegion = "County",
value = "Cases",
save = F,
xRef = "2018",
underLayer = mapUS49County,
title = "Difference with 2018 malaria cases")
malaria_map_diff_2018$map_param_KMEANS_xDiffAbs
malaria_map_diff_2018$map_param_KMEANS_xDiffPrcnt
For some reason, malaria_map_diff_2018$map_param_KMEANS_xDiffPrcnt
does not display values for some counties. I think this might be because extreme values of the legend classes are not comprised in any classes ? (e.g. Counties where exactly 0% change occurs are not represented in any class). I think this issue needs to be resolved. Adding square brackets around the extreme values of the classes could help specify in which class the extreme values are included.
After retesting the package, from my point of view, I think it is the responsibility of the user to make sure its data are complete, up-to-date and correct (e.g. make sure all Region names are properly written) and match the “subregions” of the rmap
built-in maps. In any case, if the user had to use another solution than rmap
, and join its data for example with the attribute table of an existing shapefile, he/she would still need to clean its data beforehand. Following @maczokni comments, maybe a section of the user guide could highlight the need for cleaning data and matching them with the built-in maps subRegions?
As for the novelty of the package, as raised by @maczokni, I think its true novelty lies indeed in the ability to easily compare across year/scenarios/categories. On another note, compared to existing packages like rnaturalearth
and tigris
, everything can be done here using only the package rmap
, while the two former solutions require combining it with other packages such as tmap
. Another advantage of rmap
(that is maybe not highlighted enough in the paper) is that it creates ggplot
objects which can easily be combined with others to change the map layout, for example, and I think ggplot
might be more known among non-GIS users than tmap
.
Except for the issue with the legend classes, I think rmap
worked as expected in the examples above. Let me know what you think @hugoledoux and @maczokni.
okay, I re-read the (very detailed and good) comments from the reviewers (thanks @CamilleMorlighem and @maczokni) and I think we can go ahead and try to finalise the submission.
I would like the 3 points raised above to be fixed:
ggplot
objects?Once this is done, I am happy to process further with acceptance of the submission
@hugoledoux @CamilleMorlighem @maczokni Thank you all for the great feedback and comments. Yes, we will address all these points and update the submission. Best Zarrar
@hugoledoux
Please see each of the points mentioned above addressed in the latest version of the paper.
Section on Cleaning Data added to user guide explaining how subRegions will need to be matched to built-in map names along with an example of how rmap automatically identifies the missing regions for you and how you can then go in and find the correct spelling.
@CamilleMorlighem Thanks for catching this. The issue here actually was that the % difference calculation uses the base year value to calculate the % difference i.e. Percent Diff = (Value in Year_i - Value in Year_Ref)*100/(Value in Year_Ref). In the examples you provided or in general when there was a 0 value in the Ref year the result was coming out to be Infinite ('Inf'). This was then resulting in an NaN value being filtered out and thus those counties being removed. We have fixed this to behave better as follows:
rmap
gives a warning saying that there is a '0' value and the difference calculation will give an Inf
value which is converted to an NA
.rmap
comes with the option of highlight single values such as 0
s as well as showing NA values (listed in separate sections of the user guide: Legend Single Value and Show NAshowNA
value to T
so it shows these.If we redo your example with legendSingleValue = 0
and showNA=0
you get the following.
This time rmap gives the warning about the Inf values and we can see "NAs" vs. '0' values in the maps.
library(stringr)
library(dplyr)
library(rmap)
# Get infectious disease data
download.file("https://data.chhs.ca.gov/dataset/03e61434-7db8-4a53-a3e2-1d4d36d6848d/resource/75019f89-b349-4d5e-825d-8b5960fc028c/download/odp_idb_2020_ddg_compliant.csv",
destfile = "infect_disease_us_county.csv", mode ="wb")
i_disease = read.csv("infect_disease_us_county.csv") ; colnames(i_disease)
# Clean data : keep years 2018-2020 and cases of Malaria, Typhus, Dengue and Lyme diseases
i_4_disease <- i_disease %>%
subset(grepl("Malaria|Typhus|Dengue|Lyme", Disease) & grepl("Total", Sex) & grepl("2018|2019|2020", Year)) %>%
filter(County != "California") %>% # csv contains a row with sum of cases for all counties in California
mutate(County = paste0(County, "_CA")) # add proper state abbreviation to match built-in map counties
malaria_map_diff_2018 = map(i_4_disease[i_4_disease == "Malaria", ],
subRegion = "County",
value = "Cases",
save = F, show=F,
xRef = "2018",
underLayer = mapUS49County,
labels=T, legendSingleValue = 0)
malaria_map_diff_2018$map_param_KMEANS
malaria_map_diff_2018$map_param_KMEANS_xDiffAbs
malaria_map_diff_2018$map_param_KMEANS_xDiffPrcnt
We mention the point about ggplot objects in the Statement of Needs as follows:
"Post-process customization: Existing packages do not produce output objects that can be saved and then customized. Customization of the maps in existing packages is limited to package specific functionality and arguments. rmap
produces ggplot
objects in which every element (axis, grids, titles, colors, line widths, facets) can all be customized after the map has been produced. This allows users to capitalize on existing knowledge of the widely used ggplot2
package and its arguments."
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Submitting author: !--author-handle-->@zarrarkhan<!--end-author-handle-- (Zarrar Khan) Repository: https://github.com/JGCRI/rmap Branch with paper.md (empty if default branch): Version: v1.0.5 Editor: !--editor-->@hugoledoux<!--end-editor-- Reviewers: @CamilleMorlighem, @maczokni Archive: 10.5281/zenodo.7085969
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@CamilleMorlighem & @maczokni, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @hugoledoux know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Review checklist for @CamilleMorlighem
✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @maczokni
✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper