Closed andysouth closed 8 years ago
@AndySouth Thanks for this! Seeking reviewer(s)
I could take a look
Thanks both!
On 26 November 2015 at 23:44, Robin notifications@github.com wrote:
I could take a look
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-160006450.
Reviewers: @Robinlovelace, @lmullen
Thanks all. I appreciate your efforts. If you have questions feel free to ask.
On 30 November 2015 at 16:17, Scott Chamberlain notifications@github.com wrote:
@Robinlovelace https://github.com/Robinlovelace and @lmullen https://github.com/lmullen reviewing - thanks both!
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-160675931.
First, let me say that this package is very useful. It will be a very helpful addition to the mapping packages on CRAN and in rOpenSci. I already have plans to use it for my own work and in assignments for students. Well done, @AndySouth.
It takes about 2 minutes to install the dev version from GitHub and the installed version is 26 MB. In addition to the fact that CRAN won't accept the package until the data size is smaller, the weight of this package also hinders you as the developer from doing updates as necessary, and it makes it difficult for users to install the package. To solve this it will be necessary to break this package into two or more other packages as detailed below. For the purposes of this review I've focused mainly on package-level issues. Once you've solved the size problem and the package design has stabilized, I'll review the code more granularly.
This package should be at least two packages, and probably three. Here is a suggestion for how you should break this up:
rnaturalearth
and it would be on CRAN. It would be the only package that a user would ever install or load directly. It should contain nothing but the code to download NaturalEarth files, functions to access the data from the other packages, and the vignettes and package level documentation. The absence of data from this package means that you can update it with impunity, since the size of the package will be very small. (I've found this also helps speed up development.)rnaturalearthData
or the like, and it should also be on CRAN. It would contain nothing but the most essential data: as much as you can fit while keeping the package under the 5MB limit set by CRAN. It will also contain the documentation for the datasets. This package will be imported by rnaturalearth
, so users will install it whenever they install rnaturalearth
. But since this data will changed infrequently, users will seldom need to update this package.rnaturalearthHiRes
or the like. This package would reside solely in the rOpenSci drat repository. It would be a suggested package of rnaturalearth
, using the AdditionalRepos
field in the DESCRIPTION. When users call a function that would use this pre-downloaded data, the rnaturalearth
package would check if rnaturalearthHiRes
is installed. If it is not installed, then it will ask to install the package from the rOpenSci drat repo. Since this would be a non-CRAN package, you can include as much data as you want. Whether it would ever make sense to break this high resolution package into several different packages would be up to you.A compromise solution would be to put minimal data into rnaturalearth
, and have all the other data in a single rOpenSci drat repository package. But if you're going to do this, I think you may as well get as well break it up into one further package as suggested above.
Now that is a lot of breaking up into multiple packages what will remain, as far as users are concerned, a single user-facing package. But this is the accepted practice for packages which depend on a lot of data. For instance, the openNLP
package installs very large packages containing trained models for different langauges from Datacube. I've had to do this in gender
and genderdata
, even though the data there is much smaller than the geographic data you are working with. And I'm in the process of redoing USAboundaries
into exactly the same package structure that I'm suggesting here. If it is a helpful model, you might take a look at gender and genderdata. Here are the relevant functions.
When you do this, you should feel free to give the data packages the CC0 license, but to give the rnaturalearth
package which contains only your own work whatever open source license you prefer.
@sckott, @karthik, @cboettig: To be able to test these changes effectively, @AndySouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories.
README.md
should be converted to README.Rmd
via devtools::use_readme_rmd()
. As a bonus, this will let the README include images easily.devtools::use_appveyor()
.rgdal
, httr
, and knitr
.All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran()
.
The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE
.
There are places where the code style can be improved. See the suggestions in this chapter of Hadley Wickham's R Packages book or the style chapter in his Advanced R book.
require(rnaturalearth)
use library(rnaturalearth)
.=
using in function definitions and calls. There should always be a single space on both sides of =
. For example, in the README vignette("rnaturalearth", package="rnaturalearth")
should be vignette("rnaturalearth", package = "rnaturalearth")
.# This is more legible
data-raw
. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim unnecessary code.match.args()
. An example of this is get_data()
and the type
parameter. For example, a function that looks like thismyfunc <- function(type = "a") {
if(type == "a") {
# ...
} else if (type =="b") {
} else {
stop("Error.")
}
}
Should be changed to this:
myfunc <- function(type = c("a", "b")) {
type <- match.arg(type)
# Logic checking type here.
}
get_data()
should be renamed to something less generic, like ne_get_data()
. Or, can you avoid exporting it?1. Maps in the package
. (I'd suggest the same for comments as well, but that's entirely up to you.)contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()
Thanks @lmullen for a very helpful review and for giving me clear guidelines to follow. I knew some of these needed doing, now I know how. I'll wait for feedback from others and then get stuck in.
On 14 December 2015 at 22:31, Lincoln Mullen notifications@github.com wrote:
First, let me say that this package is very useful. It will be a very helpful addition to the mapping packages on CRAN and in rOpenSci. I already have plans to use it for my own work and in assignments for students. Well done, @AndySouth https://github.com/AndySouth.
It takes about 2 minutes to install the dev version from GitHub and the installed version is 26 MB. In addition to the fact that CRAN won't accept the package until the data size is smaller, the weight of this package also hinders you as the developer from doing updates as necessary, and it makes it difficult for users to install the package. To solve this it will be necessary to break this package into two or more other packages as detailed below. For the purposes of this review I've focused mainly on package-level issues. Once you've solved the size problem and the package design has stabilized, I'll review the code more granularly. Structural changes
This package should be at least two packages, and probably three. Here is a suggestion for how you should break this up:
- The main package would be rnaturalearth and it would be on CRAN. It would be the only package that a user would ever install or load directly. It should contain nothing but the code to download NaturalEarth files, functions to access the data from the other packages, and the vignettes and package level documentation. The absence of data from this package means that you can update it with impunity, since the size of the package will be very small. (I've found this also helps speed up development.)
- The second package would be rnaturalearthData or the like, and it should also be on CRAN. It would contain nothing but the most essential data: as much as you can fit while keeping the package under the 5MB limit set by CRAN. It will also contain the documentation for the datasets. This package will be imported by rnaturalearth, so users will install it whenever they install rnaturalearth. But since this data will changed infrequently, users will seldom need to update this package.
- The third package will be rnaturalearthHiRes or the like. This package would reside solely in the rOpenSci drat repository http://packages.ropensci.org/. It would be a suggested package of rnaturalearth, using the AdditionalRepos field in the DESCRIPTION. When users call a function that would use this pre-downloaded data, the rnaturalearth package would check if rnaturalearthHiRes is installed. If it is not installed, then it will ask to install the package from the rOpenSci drat repo. Since this would be a non-CRAN package, you can include as much data as you want. Whether it would ever make sense to break this high resolution package into several different packages would be up to you.
A compromise solution would be to put minimal data into rnaturalearth, and have all the other data in a single rOpenSci drat repository package. But if you're going to do this, I think you may as well get as well break it up into one further package as suggested above.
Now that is a lot of breaking up into multiple packages what will remain, as far as users are concerned, a single user-facing package. But this is the accepted practice for packages which depend on a lot of data. For instance, the openNLP package installs very large packages containing trained models for different langauges from Datacube. I've had to do this in gender and genderdata, even though the data there is much smaller than the geographic data you are working with. And I'm in the process of redoing USAboundaries into exactly the same package structure that I'm suggesting here. If it is a helpful model, you might take a look at gender https://github.com/ropensci/gender and genderdata https://github.com/ropensci/genderdata. Here are the relevant functions https://github.com/ropensci/gender/blob/master/R/install-genderdata-package.R .
When you do this, you should feel free to give the data packages the CC0 license, but to give the rnaturalearth package which contains only your own work whatever open source license you prefer.
@sckott https://github.com/sckott, @karthik https://github.com/karthik, @cboettig https://github.com/cboettig: To be able to test these changes effectively, @AndySouth https://github.com/AndySouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories. Tooling
- The README.md should be converted to README.Rmd via devtools::use_readme_rmd(). As a bonus, this will let the README include images easily.
- The package should use Appveyor's CI for Windows via devtools::use_appveyor().
DESCRIPTION
- Specify a version number for rgdal, httr, and knitr.
Tests
All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran(). Vignette
The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE. Code style
There are places where the code style can be improved. See the suggestions in this chapter of Hadley Wickham's R Packages book http://r-pkgs.had.co.nz/r.html or the style chapter http://adv-r.had.co.nz/Style.html in his Advanced R book.
- In the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).
- Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").
- Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.) E.g.,
This is more legible
Most important, all code which has been commented out needs to be
deleted.
The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim
unnecessary code.
Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like this
myfunc <- function(type = "a") { if(type == "a") {
...
} else if (type =="b") {
} else { stop("Error.") } }
Should be changed to this:
myfunc <- function(type = c("a", "b")) { type <- match.arg(type)
Logic checking type here.
}
- get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?
Human language style
- In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)
- This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-164580884.
A number of packages provide geographic data, but none has yet made the high quality datasets within the naturalearth project easily available to R users. rnaturalearth does this with a user-friendly command-line interface, which encourages exploration of what the collection has to offer, in addition to simply providing country and 'state' outlines at various level of geographical resolution. The package will very useful for anyone wanting to make simple yet accurate maps quickly.
ROpenSci, with its emphasis on data access, is an ideal community space to host this package.
Overall I found the package intuitive and easy to use. The preface of ne_
is useful for quickly accessing the package's functions (in a similar way that stringr functions begin with st_
) and I think the order and description of the function arguments makes sense, given the structure of the data on the naturalearth data repository.
The main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download()
would be interesting here. The what-is-a-country
vignette provides a useful discussion of the subtleties surrounding up how to divide up geographical space along national, state and regional borders.
Minor comments/suggestions:
@param
type
in ne_download
could be better at conveying to the user what datasets are available. It is not clear from the function documentation, for example, that lakes can be downloaded with ne_download(type = 'lakes', category = 'physical')
. To resolve this issue I suggest that a link to the dataset types is make more explicit (e.g. "see naturalearthdata.com/downloads/110m-cultural-vectors/ for the types of data that can be downloaded for cultural 110 m resolution data.")ne_download
could usefully be split into ne_download10
, ne_download50
and ne_download110
, while keeping the generic ne_download
function intact. The data available varies depending on scale, so ne_download10
, for example, could provide descriptions of the very useful roads and railways datasets. It was not immediately obvious that these very interesting datasets were made available with - this could usefully be better documented.roads <- ne_download(scale = 10, type = "roads", category = "cultural")
railroads <- ne_download(scale = 10, type = "railroads", category = "cultural")
sp
is an import but I think the package would benefit from making it a dependency (as it is with stplanr after some debate): all the data loaded by the package that I've seen is in an sp S4 object. Yet these will not work (e.g. plot) properly unless sp is loaded, as illustrated in the vignette. This would also make the plot commands in the vignette simpler.scale = 10
isn't listed as an argument of ne_countries
("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.category
is a needed argument. airports <- ne_download(scale = 10, type = "airports")
, for example, works the same as airports <- ne_download(scale = 10, type = "airports")
- there are no types I can see that duplicate names between categories.ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical")
or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical")
to work but neither seemed to...In-line with the previous reviewer I think the size of the package could be problematic. I think the suggestion of splitting it into other packages is a good one. An alternative would be to provide a default destination within the package for data download, not host any data on the package itself directly, and simply use the package to access the data hosted on rnaturalearth. Other than package install requirements, the clear advantage of that is that the data won't have to be updated. The disadvantage is that the user would need internet access the first time they accessed many datasets and possible complexities surrounding fresh installs wiping previously downloaded data.
if(file.exists(...))
could be used to decide whether the data should be downloaded or not. Below is an illustrative example for what the call could look like something like this, for ne_countries()
:
ne_countries_10 <- function(){
file_path <- file.path(system.file("data", package = "rnaturalearth"), "countries10.rda")
if(file.exists(file_path)){
x <- readRDS(file_path)
}else{
x <- ne_download()
saveRDS(x, file_path)
}
x
}
Note that data
may need to change to exdata
in the above code and that this is just a preliminary idea - would be interested to hear thoughts from others. In any case I can see a disadvantage of hosting raw data on CRAN or ROpenSci in terms of the DRY principle.
I'm not sure about specifying providing the same data with data(countries110)
is necessary given t countries10 <- ne_countries(110)
does the same job.
I'd like to point out that I'm not an expert in this area, think the previous reviewers' suggestions in relation to package size is also another viable option and would be interested in hearing views from others, especially in relation to the DRY principle as it applies to data.
Thanks @Robinlovelace https://github.com/Robinlovelace for a considered and useful review.
On 17 December 2015 at 00:15, Robin notifications@github.com wrote:
Review
A number of packages provide geographic data, but none has yet made the high quality datasets within the naturalearth project easily available to R users. rnaturalearth does this with a user-friendly command-line interface, which encourages exploration of what the collection has to offer, in addition to 'simply' providing country and 'state' outlines at various level of geographical resolution. The package will very useful for anyone wanting to make simple yet accurate maps quickly.
ROpenSci, with its emphasis on data access, is an ideal community space to host this package. Useability of the functions
Overall I found the package intuitive and easy to use. The preface of ne is useful for quickly accessing the package's functions (in a similar way that stringr functions begin with st) and I think the order and description of the function arguments makes sense, given the structure of the data on the naturalearth data repository.
The main vignette https://github.com/AndySouth/rnaturalearth/blob/master/vignettes/rnaturalearth.Rmd succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here. The what-is-a-country vignette provides a useful discussion of the subtleties surrounding up how to divide up geographical space along national, state and regional borders.
Minor comments/suggestions:
-
I think the description of the @param type in ne_download could be better at conveying to the user what datasets are available. It is not clear from the function documentation, for example, that lakes can be downloaded with ne_download(type = 'lakes', category = 'physical'). To resolve this issue I suggest that a link to the dataset types is make more explicit (e.g. "see naturalearthdata.com/downloads/110m-cultural-vectors/ http://www.naturalearthdata.com/downloads/110m-cultural-vectors/ for the types of data that can be downloaded for cultural 110 m resolution data.")
To assist with the 'making it clear what data is available' issue, I think that ne_download could usefully be split into ne_download10, ne_download50 and ne_download110, while keeping the generic ne_download function intact. The data available varies depending on scale, so ne_download10, for example, could provide descriptions of the very useful roads and railways datasets. It was not immediately obvious that these very interesting datasets were made available with - this could usefully be better documented.
roads <- ne_download(scale = 10, type = "railroads", category = "cultural")railroads <- ne_download(scale = 10, type = "railroads", category = "cultural")
-
sp is a dependency but I think the package would benefit from making it a dependency: all the data loaded by the package that I've seen is in an sp S4 object. Yet these will not work (e.g. plot) properly unless sp is loaded, as illustrated in the vignette. This would also make the plot commands in the vignette simpler.
scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.
It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.
I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...
Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.
Thoughts on reducing the package size
In-line with the previous reviewer I think the size of the package could be problematic. I think the suggestion of splitting it into other packages is a good one. An alternative would be to provide a default destination within the package for data download, not host any data on the package itself directly, and simply use the package to access the data hosted on rnaturalearth. Other than package install requirements, the clear advantage of that is that the data won't have to be updated. The disadvantage is that the user would need internet access the first time they accessed many datasets and possible complexities surrounding fresh installs wiping previously downloaded data.
if(file.exists(...)) could be used to decide whether the data should be downloaded or not. Below is an illustrative example for what the call could look like something like this, for ne_countries():
ne_countries_10 <- function(){ file_path <- file.path(system.file("data", package = "rnaturalearth"), "countries10.rda") if(file.exists(file_path)){ x <- readRDS(file_path) }else{ x <- ne_download() saveRDS(x, file_path) } x }
Note that data may need to change to exdata in the above code and that this is just a preliminary idea - would be interested to hear thoughts from others. In any case I can see a disadvantage of hosting raw data on CRAN or ROpenSci in terms of the DRY principle.
I'm not sure about specifying a function to generate data ne_countries() while providing the same data with data(countries110), and the type argument is necessary, especially given the suggestion above of a way to reduce the package's size whilst maintaining the downloaded data in a default and accessible location.
I'd like to point out that I'm not an expert in this area, think the previous reviewers' suggestions in relation to package size is also another viable option and would be interested in hearing views from others, especially in relation to the DRY principle as it applies to data.
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-165296951.
Another option for reducing package size (though experimental and could delay CRAN submission).
In another context, I'm developing a framework for using GitHub to distribute evolving data sources without putting them in a package; see early vignette, and beginnings of manuscript which include a little about where we're thinking of taking this.
In short, the data don't go in the package, or even in the repository but live in "github releases" and are fetched as they're used. This decouples the development of the code of the package from the data source which avoids the issues of duplicating the data over and over on CRAN and makes for fast package downloads.
@richfitz: datastorr looks really great. I'm looking forward to the release.
hey @AndySouth let us know if there's anything we can do to help...
Thanks @sckott https://github.com/sckott. I've been meaning to get back to this. I have a plan to start by following parts of the suggestions of both reviewers. I'll try to get something to you in the next few weeks.
On 11 February 2016 at 19:34, Scott Chamberlain notifications@github.com wrote:
hey @AndySouth https://github.com/AndySouth let us know if there's anything we can do to help...
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-183026131.
thanks! sounds good
Thanks for your patience, finally found spare time to finish addressing these useful recomendations from the reviewers.
It fails travis checks currently due to the relationships between the 3 packages, but passes checks locally.
done. converted to 3 packages rnaturalearth, rnaturalearthdata & rnaturalearthhires. The first 2 are aimed at CRAN and the 3rd the rOpenSci drat repo)
I'm ready for that stage now if you are.
done
done
I don't know what version number is needed for these packages. Feel like I would just be making one up. Where might I look ?
done
done
done
I recognise this, but unless it's essential would prefer to keep my sometimes idiosyncratic use of spaces which I use to enhance my readability (and out of habit too).
done
done (mostly)
I have kept some commented code as these are the areas where it is most likely I will need to adapt old code in repsonse to potential future data changes, and least likely that users need to access.
myfunc <- function(type = "a") { if(type == "a") {
} else if (type =="b") {
} else { stop("Error.") } } Should be changed to this:
myfunc <- function(type = c("a", "b")) { type <- match.arg(type)
}
Done, except where I wanted to give the user more guidance if they submitted an incorrect option.
no longer exported
done in vignettes
done
started. More could definitely be added, and I will but would be good to get out first.
Thanks for pointing this out. I've added a table to the start of the man page for ne_download() which outlines the main maps available at different scales
I think that the improved documentation in ne_download() makes splitting it into the 3 functions unnecessary.
I'm open to considering making sp a dependency, how do others feel ? I wanted to keep this as light on dependencies as possible. If a user was using fortify with ggplot2 they wouldn't need sp, right ?
done
Whilst you are correct, category is needed to construct the download url. If I were to dispense with it I would need to create a list of all the maps and which category they are in. I wanted to keep the link to the Natural Earth data as simple as possible so that if it changes I don't have too much work to do.
ne_download(scale=50, type="antarctic_ice_shelves_polys", category="physical") I am following natural earth filenaming directly to try to keep this sustainable.
I added a link to seealso for ne_download()
I've added support for raster downloads and tiny_countries points.
Thanks @richfitz for the data storage suggestion. I chose the path more well travelled for now, but am interested in that approach for the future.
@AndySouth Glad to see the next version of this package.
@karthik and @sckott: I can review this package in more detail now (not anticipating any problems) but perhaps it would be best to have the high resolution data in drat so I can check that too.
@AndySouth: The version numbers for rgdal etc should be whatever the current version is, since that is what you are developing against. You can get the current versions from packageVersion()
or better yet from looking at CRAN.
hmm, I don't know the process for adding repos to our drat, asked over there https://github.com/ropensci/drat/issues/10
@lmullen so we just need rnaturalearthhires
in drat for now?
Yes, I think that should work, right @AndySouth? The other two will be on CRAN, so I can install them as if they were on CRAN. Then test with and without the high res package.
On Tue, Mar 22, 2016 at 2:38 PM, Scott Chamberlain <notifications@github.com
wrote:
@lmullen https://github.com/lmullen so we just need rnaturalearthhires in drat for now?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-199955243
Lincoln Mullen, http://lincolnmullen.com Assistant Professor, Department of History & Art History George Mason University
@lmullen @sckott Yes, thanks that sounds good.
Working on it now
hmm, i can't get drat to work for rnaturalearthhires
- @cboettig not sure what's going on, would it help if we move any pkgs into ropensci org account?
@sckott um, not your fault I think; it loks like CircleCI is failing to build drat, so the package repo on amazon isn't getting updated. Circle CI seems to be having some issues reading the environmental variables all of a sudden, they have a note about the issue up, hope it is resolved soon. May need to run drat locally once to avoid timeout on the S3 download logs too
Okay, hopefully it will resolve soon
@sckott looks like rnaturalearthhires is up on drat now:
http://packages.ropensci.org/src/contrib/PACKAGES shows:
Package: rnaturalearthhires
Version: 0.0.0.9000
Depends: R (>= 3.1.1)
Imports: sp (>= 1.0.15)
Suggests: knitr, testthat (>= 0.9.1)
License: CC0
MD5sum: 4b7c1c00c6e4a8ce553f4d0867cf3e2a
NeedsCompilation: no
sweet!
@AndySouth @lmullen rnaturalearthhires
is working on ropensci drat now. Do we need to add the other pkgs as well, or one of them?
@sckott @lmullen Thanks Scott. Will you want the other packages there if they are accepted to rOpensci ? More broadly, my thought was that I would wait for further comments that Lincoln offered and, when I have time, submit the two smaller packages to Cran.
@sckott Pretty sure I just need rnaturalearthhires
on drat if the other
two are intended for CRAN.
@AndySouth: I'll take another look at the packages as soon as I can. (I'm teaching mapping on Wednesday: maybe I'll even use them in class!)
On Tue, Mar 29, 2016 at 5:29 AM, Andy South notifications@github.com wrote:
@sckott https://github.com/sckott @lmullen https://github.com/lmullen Thanks Scott. Will you want the other packages there if they are accepted to rOpensci ? More broadly, my thought was that I would wait for further comments that Lincoln offered and, when I have time, submit the two smaller packages to Cran.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-202801909
Lincoln Mullen, http://lincolnmullen.com Assistant Professor, Department of History & Art History George Mason University
Will you want the other packages there if they are accepted to rOpensci ?
by there @AndySouth do you mean drat? if they are accepted in our suite, they'll be added to our drat
@sckott thanks, yes I did mean drat.
On 29 March 2016 at 16:55, Scott Chamberlain notifications@github.com wrote:
Will you want the other packages there if they are accepted to rOpensci ?
by there @AndySouth https://github.com/AndySouth do you mean drat? if they are accepted in our suite, they'll be added to our drat
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/22#issuecomment-202971988
@lmullen @Robinlovelace would you like to be added to authors for the package ? (if that's allowed). I think your contributions warrant it and I would like to keep it collaborative. You'd be welcome to make any or no further contributions in future. Up to you either way.
@AndySouth That's a generous offer. But I think that for the integrity of the peer review process, it would be better to maintain the distinction between author and reviewer, even though the strength of this form of review is that it is collaborative rather than adversarial. (I don't want to speak for @Robinlovelace, of course, who may disagree.) And in my own case, I haven't contributed any actual code as of yet.
@lmullen thanks I can see your point, I guess I wasn't thinking of this as peer review in the usual sense. I'm happy to go with what you think best. I do think of you as having contributed code as I copy/pasted parts of your code that you pointed me towards and modified them only minorly.
Agree with @lmullen that I've not contributed any code so am not a package author. There's an ongoing debate about how to give credit for the important work of peer review, so perhaps saying somewhere "Reviewed by ..." would be good. Interestingly R provides many types of package contributors, including thesis advisor! (see ?person
).
Delving further I tried the following:
> person("Lincoln", "Mullen", role = c("rev"))
[1] "Lincoln Mullen [rev]"
> person("Lincoln", "Mullen", role = c("reviewer"))
[1] "Lincoln Mullen"
Warning message:
In .canonicalize_person_role(role) :
Invalid role specification: ‘reviewer’.
So rev
works, guess it means 'package reviewer' (although does not say so in the help) and I'd be stoked to mentioned in that way. What are your thoughts on this way of doing in @lmullen and @sckott (as a potential template for the future). Either way it's been a great experience and illustrates how much peer review of code is beneficial.
@Robinlovelace thanks, crediting reviewers like this sounds like a good idea.
@lmullen @sckott any news ?
@AndySouth looking back through comments, was I supposed to do something? this does work now install.packages("rnaturalearthhires", repos = c("http://packages.ropensci.org", "http://cran.rstudio.com"))
@sckott @AndySouth: Sorry I've been slow. Got hung up with the end of the semester. I'll review the revised package soon.
@lmullen @sckott Thanks and no worries, I look forward to it.
@AndySouth: Nice work on this package. I think it is very close to being done and accepted into rOpenSci, then on its way to CRAN. Here are a few minor fixes or questions I found.
The separation of packages seems to work fine. I was a little surprised that the package installs the data packages using devtools instead of using the rOpenSci repository. What is the reason for that choice? I'd prefer that the rOpenSci repository was used if possible.
The DESCRIPTION will need this line added:
Additional_repositories: http://packages.ropensci.org
I'd still like to see package version numbers for the dependencies in the DESCRIPTION.
I get the following note, which should be easily corrected:
* checking R code for possible problems ... NOTE
ne_download: no visible global function definition for ‘download.file’
ne_download: no visible global function definition for ‘unzip’
Undefined global functions or variables:
download.file unzip
Consider adding
importFrom("utils", "download.file", "unzip")
to your NAMESPACE file.
In addition, R CMD check fails for me when the rnaturalearthhires package is not available. This is because it can't build the vignette.
The vignette fails to build if rnaturalearthhires is not available. Since that package won't be CRAN, I don't think this is going to pass CRAN checks. You should probably add eval = FALSE
to the chunks in the vignette that use the high resolution data. It is sufficient to show what the code looks like without actually running it.
I get a warning message when running the code in the examples in ne_download()
. When I run
spdf_world2 <- ne_load( scale = 110, type = 'countries' )
I get this warning message which does not seem relevant to the command run.
Warning message:
In if (category == "raster") { :
the condition has length > 1 and only the first element will be used
The reason is that the default argument for category
is a vector of three choices. This function should probably have the line
category <- match.arg(category)
I noticed that the Travis and Appveyor tests are failing. At least the Travis tests should be passing. It looks like the problem is that the installation of rgdal fails because you don't have the rgdal dev package installed on Ubuntu. I'm not sure why the binary package as specified in the Travis file is not working. But you should be able to install rgdal from source if you install the apt package libgdal-dev
. As for Appveyor, it looks like the problem has to do with package installation as well.
Not a requirement, but you might consider adding code coverage with covr.
I'd suggest changing the title of the introductory vignette to "Introduction to rnaturalearth."
The package is inconsistent in how it handles spaces around =
in function arguments. All user-facing documentation (i.e., examples and vignettes) should be made consistent in having a single space on both sides of the =
. There are a few places with inconsistent spacing around parentheses as well.
That's about it. Nice work!
Many thanks @lmullen for the thorough 2nd review. I'll sort these issues over the next couple of weeks.
@sckott I've addressed all issues except one. There's a remaining problem with Travis rgdal install that I've fought with and failed, be good to get some input as it seems you may have had similar with rnoaa. Appveyor is now building successfully.
Done for rnaturalearthhires, but rnaturalearthdata is not yet on rOpenSCi or CRAN, I have put in the code ready to change once rnaturalearthdata is on rOpenSci.
#utils::install.packages("rnaturalearthdata",
# repos = c("http://packages.ropensci.org", "http://cran.rstudio.com"),
# type = "source")
Done
done
done
done
Now build success on appveyor
For Travis I've tried various fixes but still can't get rgdal to install despite the binary installation seemingly working for travis on similar packages e.g. https://github.com/Hackout2/repijson/blob/master/.travis.yml
I'd appreciate any input on this.
added a github issue to look at covr for future release
done
done
asking about rgdal, i've been using old language: c
setup, which works, but means you don't get the faster builds, caching, container based infrastucture, etc. - asking jim hester
jim pointed me to this issue on travis-ci repo https://github.com/travis-ci/travis-ci/issues/5852
So Edzer uses this setup https://github.com/edzer/gstat/blob/master/.travis.yml to install gdal/geos from source
worth trying dist: trusty
in your travis file e.g. https://github.com/jhollist/quickmapr/blob/master/.travis.yml#L3
@AndySouth I looked at the Travis logs and your comments in your Travis config file. Two ideas:
First, Travis does install the Ubuntu r-cran-rgdal
package as it should, but then it also tries to install rgdal
from source. That seems to me like a bug in Travis.
Second, I couldn't exactly tell which combination of options you used at any given time. I just tried it on my local version of Ubuntu (16.04) and all that I needed to install was libgdal-dev
to compile rgdal
from source. It looks like one of your option has conflicting versions of libgdal. Maybe try it with both the binary version r-cran-rgdal
as you have now plus installing just libgdal-dev
with before_install:
?
@AndySouth I think you might also want to use install_github:
to install rnaturalearthdata
.
Thanks @lmullen @sckott I've tried both of your suggestions and various other permutations except for the c setup .
I don't fully understand how travis works. There are 2 setups (1-using binary and 2-installing from source ) that apparently should work and seem to work for other packages (1-https://github.com/Hackout2/repijson/blob/master/.travis.yml, 2-https://github.com/jhollist/quickmapr/blob/master/.travis.yml#L3) but don't work for mine. I'm reluctant to try the 3rd potential c solution.
Currently I've got it set seemingly the same as in quickmapr (2) but it is still failing with The following packages have unmet dependencies: libgdal-dev : Depends: libhdf5-serial-dev
The comment on the end of here https://github.com/travis-ci/travis-ci/issues/5215 prompted me to try removing the version dependency on rgdal in DESCRIPTION, however that seemed not to solve it either.
Am I missing anything obvious ?
I've forked to my account to see if I can sort it out, will report back if successful
@sckott thankyou thankyou
rnaturalearth provides :
devtools::check()
produce any errors or warnings? If so paste them below. One NOTE