andysouth commented 8 years ago

1. What does this package do? (explain in 50 words or less)
  rnaturalearth provides :
a pre-downloaded subset of Natural Earth vectors commonly used in world mapping.
functions to download other Natural Earth vectors.
an open, reproducible workflow from www.naturalearthdata.com enabling updating as new versions become available.
flexibility to get maps classified by countries, sovereign states and map-units.
1. Paste the full DESCRIPTION file inside a code block (bounded by ``` on either end).

Package: rnaturalearth
Title: World Vector Map Data from Natural Earth
Version: 0.0.0.9000
Authors@R: person("Andy", "South", , "southandy@gmail.com", role = c("aut", "cre"))
Description: Facilitates mapping by making natural earth map data from http://
    www.naturalearthdata.com/ more easily available to R users. Focuses on vector
    data.
License: CC0
LazyData: true
LazyDataCompression: xz
URL: https://github.com/AndySouth/rnaturalearth
BugReports: https://github.com/AndySouth/rnaturalearth/issues
Depends:
    R (>= 3.1.1)
Imports:
    sp (>= 1.0.15),
    rgdal
Suggests:
    knitr,
    testthat (>= 0.9.1),
    httr
VignetteBuilder: knitr
RoxygenNote: 5.0.1

1. URL for the package (the development repository, not a stylized html page) https://github.com/AndySouth/rnaturalearth
1. What data source(s) does it work with (if applicable)? www.naturalearthdata.com
1. Who is the target audience? Anyone who wants to use Natural Earth vector data to make a map or perform other geographical analyses.
1. Are there other R packages that accomplish the same thing? If so, what is different about yours? More limited subsets of Natural Earth data are available in at least the following CRAN packages rworldmap, rworldxtra, choroplethr, maps, oce, tmap. rnaturalearth is different in that it provides a comprehensive reproducible solution for access to Natural Earth vector map data either pre-downloaded or by facilitating download. By separating data access from visualisation this package provides a resource that can be used by other visualisation packages.
1. Check the box next to each policy below, confirming that you agree. These are mandatory.
[x] This package does not violate the Terms of Service of any service it interacts with.
[x] The repository has continuous integration with Travis and/or another service
[x] The package contains a vignette
[x] The package contains a reasonably complete readme with devtools install instructions
[x] The package contains unit tests
[x] The package only exports functions to the NAMESPACE that are intended for end users
1. Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.
[no] Are there any package dependencies not on CRAN?
[yes] Do you intend for this package to go on CRAN?
[yes] Does the package have a CRAN accepted license?
[no] Did devtools::check() produce any errors or warnings? If so paste them below. One NOTE
checking installed package size ... NOTE installed size is 26.0Mb sub-directories of 1Mb or more: data 25.8Mb
1. Please add explanations below for any exceptions to the above: The package is currently too big because of the fine scale (10m) data. Hadley suggested : "I think you need to keep the data under five meg. I'd suggest that you put the fine level data a separate package - you can then make that available via a drat repo (perhaps the ROpenSci one?), or just via github." I don't know anything about drat repos. I'm happy to take advice and modify the package(s) as recommended.

sckott commented 8 years ago

@AndySouth Thanks for this! Seeking reviewer(s)

Robinlovelace commented 8 years ago

I could take a look

andysouth commented 8 years ago

Thanks both!

On 26 November 2015 at 23:44, Robin notifications@github.com wrote:

I could take a look

— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-160006450.

sckott commented 8 years ago

Reviewers: @Robinlovelace, @lmullen

andysouth commented 8 years ago

Thanks all. I appreciate your efforts. If you have questions feel free to ask.

On 30 November 2015 at 16:17, Scott Chamberlain notifications@github.com wrote:

@Robinlovelace https://github.com/Robinlovelace and @lmullen https://github.com/lmullen reviewing - thanks both!

— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-160675931.

lmullen commented 8 years ago

First, let me say that this package is very useful. It will be a very helpful addition to the mapping packages on CRAN and in rOpenSci. I already have plans to use it for my own work and in assignments for students. Well done, @AndySouth.

It takes about 2 minutes to install the dev version from GitHub and the installed version is 26 MB. In addition to the fact that CRAN won't accept the package until the data size is smaller, the weight of this package also hinders you as the developer from doing updates as necessary, and it makes it difficult for users to install the package. To solve this it will be necessary to break this package into two or more other packages as detailed below. For the purposes of this review I've focused mainly on package-level issues. Once you've solved the size problem and the package design has stabilized, I'll review the code more granularly.

Structural changes

This package should be at least two packages, and probably three. Here is a suggestion for how you should break this up:

The main package would be rnaturalearth and it would be on CRAN. It would be the only package that a user would ever install or load directly. It should contain nothing but the code to download NaturalEarth files, functions to access the data from the other packages, and the vignettes and package level documentation. The absence of data from this package means that you can update it with impunity, since the size of the package will be very small. (I've found this also helps speed up development.)
The second package would be rnaturalearthData or the like, and it should also be on CRAN. It would contain nothing but the most essential data: as much as you can fit while keeping the package under the 5MB limit set by CRAN. It will also contain the documentation for the datasets. This package will be imported by rnaturalearth, so users will install it whenever they install rnaturalearth. But since this data will changed infrequently, users will seldom need to update this package.
The third package will be rnaturalearthHiRes or the like. This package would reside solely in the rOpenSci drat repository. It would be a suggested package of rnaturalearth, using the AdditionalRepos field in the DESCRIPTION. When users call a function that would use this pre-downloaded data, the rnaturalearth package would check if rnaturalearthHiRes is installed. If it is not installed, then it will ask to install the package from the rOpenSci drat repo. Since this would be a non-CRAN package, you can include as much data as you want. Whether it would ever make sense to break this high resolution package into several different packages would be up to you.

A compromise solution would be to put minimal data into rnaturalearth, and have all the other data in a single rOpenSci drat repository package. But if you're going to do this, I think you may as well get as well break it up into one further package as suggested above.

Now that is a lot of breaking up into multiple packages what will remain, as far as users are concerned, a single user-facing package. But this is the accepted practice for packages which depend on a lot of data. For instance, the openNLP package installs very large packages containing trained models for different langauges from Datacube. I've had to do this in gender and genderdata, even though the data there is much smaller than the geographic data you are working with. And I'm in the process of redoing USAboundaries into exactly the same package structure that I'm suggesting here. If it is a helpful model, you might take a look at gender and genderdata. Here are the relevant functions.

When you do this, you should feel free to give the data packages the CC0 license, but to give the rnaturalearth package which contains only your own work whatever open source license you prefer.

@sckott, @karthik, @cboettig: To be able to test these changes effectively, @AndySouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories.

Tooling

The README.md should be converted to README.Rmd via devtools::use_readme_rmd(). As a bonus, this will let the README include images easily.
The package should use Appveyor's CI for Windows via devtools::use_appveyor().

DESCRIPTION

Specify a version number for rgdal, httr, and knitr.

Tests

All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran().

Vignette

The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE.

Code style

There are places where the code style can be improved. See the suggestions in this chapter of Hadley Wickham's R Packages book or the style chapter in his Advanced R book.

In the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).
Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").
Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.) E.g.,

# This is more legible

Most important, all code which has been commented out needs to be deleted.
The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim unnecessary code.
Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like this

myfunc <- function(type = "a") {
  if(type == "a") {
    # ...
  } else if (type =="b") {

  } else {
    stop("Error.")
  }
}

Should be changed to this:

myfunc <- function(type = c("a", "b")) {
  type <- match.arg(type)
  # Logic checking type here.
}

get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?

Human language style

In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)
This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()

andysouth commented 8 years ago

Thanks @lmullen for a very helpful review and for giving me clear guidelines to follow. I knew some of these needed doing, now I know how. I'll wait for feedback from others and then get stuck in.

On 14 December 2015 at 22:31, Lincoln Mullen notifications@github.com wrote:

First, let me say that this package is very useful. It will be a very helpful addition to the mapping packages on CRAN and in rOpenSci. I already have plans to use it for my own work and in assignments for students. Well done, @AndySouth https://github.com/AndySouth.

It takes about 2 minutes to install the dev version from GitHub and the installed version is 26 MB. In addition to the fact that CRAN won't accept the package until the data size is smaller, the weight of this package also hinders you as the developer from doing updates as necessary, and it makes it difficult for users to install the package. To solve this it will be necessary to break this package into two or more other packages as detailed below. For the purposes of this review I've focused mainly on package-level issues. Once you've solved the size problem and the package design has stabilized, I'll review the code more granularly. Structural changes

This package should be at least two packages, and probably three. Here is a suggestion for how you should break this up:

The main package would be rnaturalearth and it would be on CRAN. It would be the only package that a user would ever install or load directly. It should contain nothing but the code to download NaturalEarth files, functions to access the data from the other packages, and the vignettes and package level documentation. The absence of data from this package means that you can update it with impunity, since the size of the package will be very small. (I've found this also helps speed up development.)

The second package would be rnaturalearthData or the like, and it should also be on CRAN. It would contain nothing but the most essential data: as much as you can fit while keeping the package under the 5MB limit set by CRAN. It will also contain the documentation for the datasets. This package will be imported by rnaturalearth, so users will install it whenever they install rnaturalearth. But since this data will changed infrequently, users will seldom need to update this package.

The third package will be rnaturalearthHiRes or the like. This package would reside solely in the rOpenSci drat repository http://packages.ropensci.org/. It would be a suggested package of rnaturalearth, using the AdditionalRepos field in the DESCRIPTION. When users call a function that would use this pre-downloaded data, the rnaturalearth package would check if rnaturalearthHiRes is installed. If it is not installed, then it will ask to install the package from the rOpenSci drat repo. Since this would be a non-CRAN package, you can include as much data as you want. Whether it would ever make sense to break this high resolution package into several different packages would be up to you.

A compromise solution would be to put minimal data into rnaturalearth, and have all the other data in a single rOpenSci drat repository package. But if you're going to do this, I think you may as well get as well break it up into one further package as suggested above.

Now that is a lot of breaking up into multiple packages what will remain, as far as users are concerned, a single user-facing package. But this is the accepted practice for packages which depend on a lot of data. For instance, the openNLP package installs very large packages containing trained models for different langauges from Datacube. I've had to do this in gender and genderdata, even though the data there is much smaller than the geographic data you are working with. And I'm in the process of redoing USAboundaries into exactly the same package structure that I'm suggesting here. If it is a helpful model, you might take a look at gender https://github.com/ropensci/gender and genderdata https://github.com/ropensci/genderdata. Here are the relevant functions https://github.com/ropensci/gender/blob/master/R/install-genderdata-package.R .

When you do this, you should feel free to give the data packages the CC0 license, but to give the rnaturalearth package which contains only your own work whatever open source license you prefer.

@sckott https://github.com/sckott, @karthik https://github.com/karthik, @cboettig https://github.com/cboettig: To be able to test these changes effectively, @AndySouth https://github.com/AndySouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories. Tooling

The README.md should be converted to README.Rmd via devtools::use_readme_rmd(). As a bonus, this will let the README include images easily.

The package should use Appveyor's CI for Windows via devtools::use_appveyor().

DESCRIPTION

Specify a version number for rgdal, httr, and knitr.

Tests

All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran(). Vignette

The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE. Code style

There are places where the code style can be improved. See the suggestions in this chapter of Hadley Wickham's R Packages book http://r-pkgs.had.co.nz/r.html or the style chapter http://adv-r.had.co.nz/Style.html in his Advanced R book.

In the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).

Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").

Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.) E.g.,

This is more legible

Most important, all code which has been commented out needs to be

deleted.

The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim

unnecessary code.

Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like this

myfunc <- function(type = "a") { if(type == "a") {

...

} else if (type =="b") {

} else { stop("Error.") } }

Should be changed to this:

myfunc <- function(type = c("a", "b")) { type <- match.arg(type)

Logic checking type here.

}

get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?

Human language style

In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)

This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()

— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-164580884.

Robinlovelace commented 8 years ago

Review

A number of packages provide geographic data, but none has yet made the high quality datasets within the naturalearth project easily available to R users. rnaturalearth does this with a user-friendly command-line interface, which encourages exploration of what the collection has to offer, in addition to simply providing country and 'state' outlines at various level of geographical resolution. The package will very useful for anyone wanting to make simple yet accurate maps quickly.

ROpenSci, with its emphasis on data access, is an ideal community space to host this package.

Useability of the functions

Overall I found the package intuitive and easy to use. The preface of ne_ is useful for quickly accessing the package's functions (in a similar way that stringr functions begin with st_) and I think the order and description of the function arguments makes sense, given the structure of the data on the naturalearth data repository.

The main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here. The what-is-a-country vignette provides a useful discussion of the subtleties surrounding up how to divide up geographical space along national, state and regional borders.

Minor comments/suggestions:

I think the description of the @param type in ne_download could be better at conveying to the user what datasets are available. It is not clear from the function documentation, for example, that lakes can be downloaded with ne_download(type = 'lakes', category = 'physical'). To resolve this issue I suggest that a link to the dataset types is make more explicit (e.g. "see naturalearthdata.com/downloads/110m-cultural-vectors/ for the types of data that can be downloaded for cultural 110 m resolution data.")
To assist with the 'making it clear what data is available' issue, I think that ne_download could usefully be split into ne_download10, ne_download50 and ne_download110, while keeping the generic ne_download function intact. The data available varies depending on scale, so ne_download10, for example, could provide descriptions of the very useful roads and railways datasets. It was not immediately obvious that these very interesting datasets were made available with - this could usefully be better documented.

roads <- ne_download(scale = 10, type = "roads", category = "cultural")
railroads <- ne_download(scale = 10, type = "railroads", category = "cultural")

sp is an import but I think the package would benefit from making it a dependency (as it is with stplanr after some debate): all the data loaded by the package that I've seen is in an sp S4 object. Yet these will not work (e.g. plot) properly unless sp is loaded, as illustrated in the vignette. This would also make the plot commands in the vignette simpler.
scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.
It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.
I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...
Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.

Thoughts on reducing the package size

In-line with the previous reviewer I think the size of the package could be problematic. I think the suggestion of splitting it into other packages is a good one. An alternative would be to provide a default destination within the package for data download, not host any data on the package itself directly, and simply use the package to access the data hosted on rnaturalearth. Other than package install requirements, the clear advantage of that is that the data won't have to be updated. The disadvantage is that the user would need internet access the first time they accessed many datasets and possible complexities surrounding fresh installs wiping previously downloaded data.

if(file.exists(...)) could be used to decide whether the data should be downloaded or not. Below is an illustrative example for what the call could look like something like this, for ne_countries():

ne_countries_10 <- function(){
  file_path <- file.path(system.file("data", package = "rnaturalearth"), "countries10.rda")
  if(file.exists(file_path)){
    x <- readRDS(file_path)
  }else{
    x <- ne_download()
    saveRDS(x, file_path)
  }
  x
}

Note that data may need to change to exdata in the above code and that this is just a preliminary idea - would be interested to hear thoughts from others. In any case I can see a disadvantage of hosting raw data on CRAN or ROpenSci in terms of the DRY principle.

I'm not sure about specifying providing the same data with data(countries110) is necessary given t countries10 <- ne_countries(110) does the same job.

I'd like to point out that I'm not an expert in this area, think the previous reviewers' suggestions in relation to package size is also another viable option and would be interested in hearing views from others, especially in relation to the DRY principle as it applies to data.

andysouth commented 8 years ago

Thanks @Robinlovelace https://github.com/Robinlovelace for a considered and useful review.

On 17 December 2015 at 00:15, Robin notifications@github.com wrote:

Review

A number of packages provide geographic data, but none has yet made the high quality datasets within the naturalearth project easily available to R users. rnaturalearth does this with a user-friendly command-line interface, which encourages exploration of what the collection has to offer, in addition to 'simply' providing country and 'state' outlines at various level of geographical resolution. The package will very useful for anyone wanting to make simple yet accurate maps quickly.

ROpenSci, with its emphasis on data access, is an ideal community space to host this package. Useability of the functions

Overall I found the package intuitive and easy to use. The preface of ne is useful for quickly accessing the package's functions (in a similar way that stringr functions begin with st) and I think the order and description of the function arguments makes sense, given the structure of the data on the naturalearth data repository.

The main vignette https://github.com/AndySouth/rnaturalearth/blob/master/vignettes/rnaturalearth.Rmd succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here. The what-is-a-country vignette provides a useful discussion of the subtleties surrounding up how to divide up geographical space along national, state and regional borders.

Minor comments/suggestions:

-

I think the description of the @param type in ne_download could be better at conveying to the user what datasets are available. It is not clear from the function documentation, for example, that lakes can be downloaded with ne_download(type = 'lakes', category = 'physical'). To resolve this issue I suggest that a link to the dataset types is make more explicit (e.g. "see naturalearthdata.com/downloads/110m-cultural-vectors/ http://www.naturalearthdata.com/downloads/110m-cultural-vectors/ for the types of data that can be downloaded for cultural 110 m resolution data.")

To assist with the 'making it clear what data is available' issue, I think that ne_download could usefully be split into ne_download10, ne_download50 and ne_download110, while keeping the generic ne_download function intact. The data available varies depending on scale, so ne_download10, for example, could provide descriptions of the very useful roads and railways datasets. It was not immediately obvious that these very interesting datasets were made available with - this could usefully be better documented.

roads <- ne_download(scale = 10, type = "railroads", category = "cultural")railroads <- ne_download(scale = 10, type = "railroads", category = "cultural")

-

sp is a dependency but I think the package would benefit from making it a dependency: all the data loaded by the package that I've seen is in an sp S4 object. Yet these will not work (e.g. plot) properly unless sp is loaded, as illustrated in the vignette. This would also make the plot commands in the vignette simpler.

scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.

It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.

I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...

Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.

Thoughts on reducing the package size

In-line with the previous reviewer I think the size of the package could be problematic. I think the suggestion of splitting it into other packages is a good one. An alternative would be to provide a default destination within the package for data download, not host any data on the package itself directly, and simply use the package to access the data hosted on rnaturalearth. Other than package install requirements, the clear advantage of that is that the data won't have to be updated. The disadvantage is that the user would need internet access the first time they accessed many datasets and possible complexities surrounding fresh installs wiping previously downloaded data.

if(file.exists(...)) could be used to decide whether the data should be downloaded or not. Below is an illustrative example for what the call could look like something like this, for ne_countries():

ne_countries_10 <- function(){ file_path <- file.path(system.file("data", package = "rnaturalearth"), "countries10.rda") if(file.exists(file_path)){ x <- readRDS(file_path) }else{ x <- ne_download() saveRDS(x, file_path) } x }

Note that data may need to change to exdata in the above code and that this is just a preliminary idea - would be interested to hear thoughts from others. In any case I can see a disadvantage of hosting raw data on CRAN or ROpenSci in terms of the DRY principle.

I'm not sure about specifying a function to generate data ne_countries() while providing the same data with data(countries110), and the type argument is necessary, especially given the suggestion above of a way to reduce the package's size whilst maintaining the downloaded data in a default and accessible location.

I'd like to point out that I'm not an expert in this area, think the previous reviewers' suggestions in relation to package size is also another viable option and would be interested in hearing views from others, especially in relation to the DRY principle as it applies to data.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-165296951.

richfitz commented 8 years ago

Another option for reducing package size (though experimental and could delay CRAN submission).

In another context, I'm developing a framework for using GitHub to distribute evolving data sources without putting them in a package; see early vignette, and beginnings of manuscript which include a little about where we're thinking of taking this.

In short, the data don't go in the package, or even in the repository but live in "github releases" and are fetched as they're used. This decouples the development of the code of the package from the data source which avoids the issues of duplicating the data over and over on CRAN and makes for fast package downloads.

lmullen commented 8 years ago

@richfitz: datastorr looks really great. I'm looking forward to the release.

sckott commented 8 years ago

hey @AndySouth let us know if there's anything we can do to help...

andysouth commented 8 years ago

Thanks @sckott https://github.com/sckott. I've been meaning to get back to this. I have a plan to start by following parts of the suggestions of both reviewers. I'll try to get something to you in the next few weeks.

On 11 February 2016 at 19:34, Scott Chamberlain notifications@github.com wrote:

hey @AndySouth https://github.com/AndySouth let us know if there's anything we can do to help...

— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-183026131.

sckott commented 8 years ago

thanks! sounds good

andysouth commented 8 years ago

Thanks for your patience, finally found spare time to finish addressing these useful recomendations from the reviewers.

It fails travis checks currently due to the relationships between the 3 packages, but passes checks locally.

responses to @lmullen

This package should be at least two packages, and probably three.

done. converted to 3 packages rnaturalearth, rnaturalearthdata & rnaturalearthhires. The first 2 are aimed at CRAN and the 3rd the rOpenSci drat repo)

@sckott, @karthik, @cboettig: To be able to test these changes effectively, @AndySouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories.

I'm ready for that stage now if you are.

The README.md should be converted to README.Rmd via devtools::use_readme_rmd().

done

The package should use Appveyor's CI for Windows via devtools::use_appveyor().

done

Specify a version number for rgdal, httr, and knitr.

I don't know what version number is needed for these packages. Feel like I would just be making one up. Where might I look ?

All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran().

done

The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE.

done

Code style

In the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).

done

Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").

I recognise this, but unless it's essential would prefer to keep my sometimes idiosyncratic use of spaces which I use to enhance my readability (and out of habit too).

Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.)

done

Most important, all code which has been commented out needs to be deleted.

done (mostly)

The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim unnecessary code.

I have kept some commented code as these are the areas where it is most likely I will need to adapt old code in repsonse to potential future data changes, and least likely that users need to access.

Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like this

myfunc <- function(type = "a") { if(type == "a") {

...

} else if (type =="b") {

} else { stop("Error.") } } Should be changed to this:

myfunc <- function(type = c("a", "b")) { type <- match.arg(type)

Logic checking type here.

}

Done, except where I wanted to give the user more guidance if they submitted an incorrect option.

get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?

no longer exported

In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)

done in vignettes

This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()

done

responses to @Robinlovelace

The main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here.

started. More could definitely be added, and I will but would be good to get out first.

I think the description of the @param type in ne_download could be better at conveying to the user what datasets are available. It is not clear from the function documentation, for example, that lakes can be downloaded with ne_download(type = 'lakes', category = 'physical'). To resolve this issue I suggest that a link to the dataset types is make more explicit (e.g. "see naturalearthdata.com/downloads/110m-cultural-vectors/ for the types of data that can be downloaded for cultural 110 m resolution data.")

Thanks for pointing this out. I've added a table to the start of the man page for ne_download() which outlines the main maps available at different scales

To assist with the 'making it clear what data is available' issue, I think that ne_download could usefully be split into ne_download10, ne_download50 and ne_download110, while keeping the generic ne_download function intact. The data available varies depending on scale, so ne_download10, for example, could provide descriptions of the very useful roads and railways datasets.

I think that the improved documentation in ne_download() makes splitting it into the 3 functions unnecessary.

sp is an import but I think the package would benefit from making it a dependency (as it is with stplanr after some debate): all the data loaded by the package that I've seen is in an sp S4 object. Yet these will not work (e.g. plot) properly unless sp is loaded, as illustrated in the vignette. This would also make the plot commands in the vignette simpler.

I'm open to considering making sp a dependency, how do others feel ? I wanted to keep this as light on dependencies as possible. If a user was using fortify with ggplot2 they wouldn't need sp, right ?

scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.

done

It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.

Whilst you are correct, category is needed to construct the download url. If I were to dispense with it I would need to create a list of all the maps and which category they are in. I wanted to keep the link to the Natural Earth data as simple as possible so that if it changes I don't have too much work to do.

I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...

ne_download(scale=50, type="antarctic_ice_shelves_polys", category="physical") I am following natural earth filenaming directly to try to keep this sustainable.

Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.

I added a link to seealso for ne_download()

additions

I've added support for raster downloads and tiny_countries points.

Thanks @richfitz for the data storage suggestion. I chose the path more well travelled for now, but am interested in that approach for the future.

lmullen commented 8 years ago

@AndySouth Glad to see the next version of this package.

@karthik and @sckott: I can review this package in more detail now (not anticipating any problems) but perhaps it would be best to have the high resolution data in drat so I can check that too.

@AndySouth: The version numbers for rgdal etc should be whatever the current version is, since that is what you are developing against. You can get the current versions from packageVersion() or better yet from looking at CRAN.

sckott commented 8 years ago

hmm, I don't know the process for adding repos to our drat, asked over there https://github.com/ropensci/drat/issues/10

sckott commented 8 years ago

@lmullen so we just need rnaturalearthhires in drat for now?

lmullen commented 8 years ago

Yes, I think that should work, right @AndySouth? The other two will be on CRAN, so I can install them as if they were on CRAN. Then test with and without the high res package.

On Tue, Mar 22, 2016 at 2:38 PM, Scott Chamberlain <notifications@github.com

wrote:

@lmullen https://github.com/lmullen so we just need rnaturalearthhires in drat for now?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-199955243

Lincoln Mullen, http://lincolnmullen.com Assistant Professor, Department of History & Art History George Mason University

andysouth commented 8 years ago

@lmullen @sckott Yes, thanks that sounds good.

sckott commented 8 years ago

Working on it now

sckott commented 8 years ago

hmm, i can't get drat to work for rnaturalearthhires - @cboettig not sure what's going on, would it help if we move any pkgs into ropensci org account?

cboettig commented 8 years ago

@sckott um, not your fault I think; it loks like CircleCI is failing to build drat, so the package repo on amazon isn't getting updated. Circle CI seems to be having some issues reading the environmental variables all of a sudden, they have a note about the issue up, hope it is resolved soon. May need to run drat locally once to avoid timeout on the S3 download logs too

sckott commented 8 years ago

Okay, hopefully it will resolve soon

cboettig commented 8 years ago

@sckott looks like rnaturalearthhires is up on drat now:

http://packages.ropensci.org/src/contrib/PACKAGES shows:

Package: rnaturalearthhires
Version: 0.0.0.9000
Depends: R (>= 3.1.1)
Imports: sp (>= 1.0.15)
Suggests: knitr, testthat (>= 0.9.1)
License: CC0
MD5sum: 4b7c1c00c6e4a8ce553f4d0867cf3e2a
NeedsCompilation: no

sckott commented 8 years ago

sweet!

sckott commented 8 years ago

@AndySouth @lmullen rnaturalearthhires is working on ropensci drat now. Do we need to add the other pkgs as well, or one of them?

andysouth commented 8 years ago

@sckott @lmullen Thanks Scott. Will you want the other packages there if they are accepted to rOpensci ? More broadly, my thought was that I would wait for further comments that Lincoln offered and, when I have time, submit the two smaller packages to Cran.

lmullen commented 8 years ago

@sckott Pretty sure I just need rnaturalearthhires on drat if the other two are intended for CRAN.

@AndySouth: I'll take another look at the packages as soon as I can. (I'm teaching mapping on Wednesday: maybe I'll even use them in class!)

On Tue, Mar 29, 2016 at 5:29 AM, Andy South notifications@github.com wrote:

@sckott https://github.com/sckott @lmullen https://github.com/lmullen Thanks Scott. Will you want the other packages there if they are accepted to rOpensci ? More broadly, my thought was that I would wait for further comments that Lincoln offered and, when I have time, submit the two smaller packages to Cran.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-202801909

Lincoln Mullen, http://lincolnmullen.com Assistant Professor, Department of History & Art History George Mason University

sckott commented 8 years ago

Will you want the other packages there if they are accepted to rOpensci ?

by there @AndySouth do you mean drat? if they are accepted in our suite, they'll be added to our drat

andysouth commented 8 years ago

@sckott thanks, yes I did mean drat.

On 29 March 2016 at 16:55, Scott Chamberlain notifications@github.com wrote:

Will you want the other packages there if they are accepted to rOpensci ?

by there @AndySouth https://github.com/AndySouth do you mean drat? if they are accepted in our suite, they'll be added to our drat

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-202971988

andysouth commented 8 years ago

@lmullen @Robinlovelace would you like to be added to authors for the package ? (if that's allowed). I think your contributions warrant it and I would like to keep it collaborative. You'd be welcome to make any or no further contributions in future. Up to you either way.

lmullen commented 8 years ago

@AndySouth That's a generous offer. But I think that for the integrity of the peer review process, it would be better to maintain the distinction between author and reviewer, even though the strength of this form of review is that it is collaborative rather than adversarial. (I don't want to speak for @Robinlovelace, of course, who may disagree.) And in my own case, I haven't contributed any actual code as of yet.

andysouth commented 8 years ago

@lmullen thanks I can see your point, I guess I wasn't thinking of this as peer review in the usual sense. I'm happy to go with what you think best. I do think of you as having contributed code as I copy/pasted parts of your code that you pointed me towards and modified them only minorly.

Robinlovelace commented 8 years ago

Agree with @lmullen that I've not contributed any code so am not a package author. There's an ongoing debate about how to give credit for the important work of peer review, so perhaps saying somewhere "Reviewed by ..." would be good. Interestingly R provides many types of package contributors, including thesis advisor! (see ?person).

Delving further I tried the following:

> person("Lincoln", "Mullen", role = c("rev"))
[1] "Lincoln Mullen [rev]"
> person("Lincoln", "Mullen", role = c("reviewer"))
[1] "Lincoln Mullen"
Warning message:
In .canonicalize_person_role(role) :
  Invalid role specification: ‘reviewer’.

So rev works, guess it means 'package reviewer' (although does not say so in the help) and I'd be stoked to mentioned in that way. What are your thoughts on this way of doing in @lmullen and @sckott (as a potential template for the future). Either way it's been a great experience and illustrates how much peer review of code is beneficial.

andysouth commented 8 years ago

@Robinlovelace thanks, crediting reviewers like this sounds like a good idea.

andysouth commented 8 years ago

@lmullen @sckott any news ?

sckott commented 8 years ago

@AndySouth looking back through comments, was I supposed to do something? this does work now install.packages("rnaturalearthhires", repos = c("http://packages.ropensci.org", "http://cran.rstudio.com"))

lmullen commented 8 years ago

@sckott @AndySouth: Sorry I've been slow. Got hung up with the end of the semester. I'll review the revised package soon.

andysouth commented 8 years ago

@lmullen @sckott Thanks and no worries, I look forward to it.

lmullen commented 8 years ago

@AndySouth: Nice work on this package. I think it is very close to being done and accepted into rOpenSci, then on its way to CRAN. Here are a few minor fixes or questions I found.

Separation of packages

The separation of packages seems to work fine. I was a little surprised that the package installs the data packages using devtools instead of using the rOpenSci repository. What is the reason for that choice? I'd prefer that the rOpenSci repository was used if possible.

The DESCRIPTION will need this line added:

Additional_repositories: http://packages.ropensci.org

I'd still like to see package version numbers for the dependencies in the DESCRIPTION.

R CMD check

I get the following note, which should be easily corrected:

* checking R code for possible problems ... NOTE
ne_download: no visible global function definition for ‘download.file’
ne_download: no visible global function definition for ‘unzip’
Undefined global functions or variables:
  download.file unzip
Consider adding
  importFrom("utils", "download.file", "unzip")
to your NAMESPACE file.

In addition, R CMD check fails for me when the rnaturalearthhires package is not available. This is because it can't build the vignette.

The vignette fails to build if rnaturalearthhires is not available. Since that package won't be CRAN, I don't think this is going to pass CRAN checks. You should probably add eval = FALSE to the chunks in the vignette that use the high resolution data. It is sufficient to show what the code looks like without actually running it.

I get a warning message when running the code in the examples in ne_download(). When I run

spdf_world2 <-    ne_load( scale = 110, type = 'countries' )

I get this warning message which does not seem relevant to the command run.

Warning message:
In if (category == "raster") { :
  the condition has length > 1 and only the first element will be used

The reason is that the default argument for category is a vector of three choices. This function should probably have the line

category <- match.arg(category)

Testing / continuous integration

I noticed that the Travis and Appveyor tests are failing. At least the Travis tests should be passing. It looks like the problem is that the installation of rgdal fails because you don't have the rgdal dev package installed on Ubuntu. I'm not sure why the binary package as specified in the Travis file is not working. But you should be able to install rgdal from source if you install the apt package libgdal-dev. As for Appveyor, it looks like the problem has to do with package installation as well.

Not a requirement, but you might consider adding code coverage with covr.

Stylistic considerations

I'd suggest changing the title of the introductory vignette to "Introduction to rnaturalearth."

The package is inconsistent in how it handles spaces around = in function arguments. All user-facing documentation (i.e., examples and vignettes) should be made consistent in having a single space on both sides of the =. There are a few places with inconsistent spacing around parentheses as well.

That's about it. Nice work!

andysouth commented 8 years ago

Many thanks @lmullen for the thorough 2nd review. I'll sort these issues over the next couple of weeks.

andysouth commented 8 years ago

@sckott I've addressed all issues except one. There's a remaining problem with Travis rgdal install that I've fought with and failed, be good to get some input as it seems you may have had similar with rnoaa. Appveyor is now building successfully.

responses to 2nd rOpenSci review from @lmullen

Separation of packages

Using rOpenSci repository for loading data packages instead of devtools.

Done for rnaturalearthhires, but rnaturalearthdata is not yet on rOpenSCi or CRAN, I have put in the code ready to change once rnaturalearthdata is on rOpenSci.

               #utils::install.packages("rnaturalearthdata",
               #                        repos = c("http://packages.ropensci.org", "http://cran.rstudio.com"),
               #                        type = "source")

I'd still like to see package version numbers for the dependencies in the DESCRIPTION.

Done

R CMD check

* checking R code for possible problems ... NOTE

ne_download: no visible global function definition for ‘download.file’

ne_download: no visible global function definition for ‘unzip’

done

The vignette fails to build if rnaturalearthhires is not available. Since that package won't be CRAN, I don't think this is going to pass CRAN checks. You should probably add eval = FALSE to the chunks in the vignette that use the high resolution data.

done

I get a warning message when running the code in the examples in ne_download()

In if (category == "raster") { :

the condition has length > 1 and only the first element will be used

done

Testing / continuous integration

Travis and Appveyor tests are failing. At least the Travis tests should be passing. It looks like the problem is that the installation of rgdal fails because you don't have the rgdal dev package installed on Ubuntu. I'm not sure why the binary package as specified in the Travis file is not working. But you should be able to install rgdal from source if you install the apt package libgdal-dev. As for Appveyor, it looks like the problem has to do with package installation as well.

Now build success on appveyor

For Travis I've tried various fixes but still can't get rgdal to install despite the binary installation seemingly working for travis on similar packages e.g. https://github.com/Hackout2/repijson/blob/master/.travis.yml

I'd appreciate any input on this.

Not a requirement, but you might consider adding code coverage with covr.

added a github issue to look at covr for future release

Stylistic considerations

I'd suggest changing the title of the introductory vignette to "Introduction to rnaturalearth."

done

All user-facing documentation (i.e., examples and vignettes) should be made consistent in having a single space on both sides of the =. There are a few places with inconsistent spacing around parentheses as well.

done

sckott commented 8 years ago

asking about rgdal, i've been using old language: c setup, which works, but means you don't get the faster builds, caching, container based infrastucture, etc. - asking jim hester

sckott commented 8 years ago

jim pointed me to this issue on travis-ci repo https://github.com/travis-ci/travis-ci/issues/5852

So Edzer uses this setup https://github.com/edzer/gstat/blob/master/.travis.yml to install gdal/geos from source

worth trying dist: trusty in your travis file e.g. https://github.com/jhollist/quickmapr/blob/master/.travis.yml#L3

lmullen commented 8 years ago

@AndySouth I looked at the Travis logs and your comments in your Travis config file. Two ideas:

First, Travis does install the Ubuntu r-cran-rgdal package as it should, but then it also tries to install rgdal from source. That seems to me like a bug in Travis.

Second, I couldn't exactly tell which combination of options you used at any given time. I just tried it on my local version of Ubuntu (16.04) and all that I needed to install was libgdal-dev to compile rgdal from source. It looks like one of your option has conflicting versions of libgdal. Maybe try it with both the binary version r-cran-rgdal as you have now plus installing just libgdal-dev with before_install:?

lmullen commented 8 years ago

@AndySouth I think you might also want to use install_github: to install rnaturalearthdata.

andysouth commented 8 years ago

Thanks @lmullen @sckott I've tried both of your suggestions and various other permutations except for the c setup .

I don't fully understand how travis works. There are 2 setups (1-using binary and 2-installing from source ) that apparently should work and seem to work for other packages (1-https://github.com/Hackout2/repijson/blob/master/.travis.yml, 2-https://github.com/jhollist/quickmapr/blob/master/.travis.yml#L3) but don't work for mine. I'm reluctant to try the 3rd potential c solution.

Currently I've got it set seemingly the same as in quickmapr (2) but it is still failing with The following packages have unmet dependencies: libgdal-dev : Depends: libhdf5-serial-dev

The comment on the end of here https://github.com/travis-ci/travis-ci/issues/5215 prompted me to try removing the version dependency on rgdal in DESCRIPTION, however that seemed not to solve it either.

Am I missing anything obvious ?

sckott commented 8 years ago

I've forked to my account to see if I can sort it out, will report back if successful

andysouth commented 8 years ago

@sckott thankyou thankyou

ropensci / software-review

rnaturalearth #22

Structural changes

Tooling

DESCRIPTION

Tests

Vignette

Code style

Human language style

This is more legible

deleted.

unnecessary code.

...

Logic checking type here.

Review

Useability of the functions

Thoughts on reducing the package size

scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.

It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.

I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...

responses to @lmullen

This package should be at least two packages, and probably three.

@sckott, @karthik, @cboettig: To be able to test these changes effectively, @AndySouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories.

The README.md should be converted to README.Rmd via devtools::use_readme_rmd().

The package should use Appveyor's CI for Windows via devtools::use_appveyor().

Specify a version number for rgdal, httr, and knitr.

All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran().

The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE.

Code style

In the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).

Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").

Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.)

Most important, all code which has been commented out needs to be deleted.

The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim unnecessary code.

Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like this

...

Logic checking type here.

get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?

In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)

This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()

responses to @Robinlovelace

The main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here.

scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.

It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.

I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...

Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.

additions

Separation of packages

R CMD check

Testing / continuous integration

Stylistic considerations

responses to 2nd rOpenSci review from @lmullen

Separation of packages

Using rOpenSci repository for loading data packages instead of devtools.

I'd still like to see package version numbers for the dependencies in the DESCRIPTION.

R CMD check

* checking R code for possible problems ... NOTE

ne_download: no visible global function definition for ‘download.file’

ne_download: no visible global function definition for ‘unzip’

The vignette fails to build if rnaturalearthhires is not available. Since that package won't be CRAN, I don't think this is going to pass CRAN checks. You should probably add eval = FALSE to the chunks in the vignette that use the high resolution data.

I get a warning message when running the code in the examples in ne_download()

In if (category == "raster") { :

the condition has length > 1 and only the first element will be used

Testing / continuous integration

Not a requirement, but you might consider adding code coverage with covr.

Stylistic considerations

I'd suggest changing the title of the introductory vignette to "Introduction to rnaturalearth."

All user-facing documentation (i.e., examples and vignettes) should be made consistent in having a single space on both sides of the =. There are a few places with inconsistent spacing around parentheses as well.