Closed kshtzgupta1 closed 5 years ago
thanks for your submission @kshtzgupta1 We are discussing now and will get back to you soon
sorry for delay on this @kshtzgupta1 - we have some conflicts of interest we're thinking through ...
@kshtzgupta1: Thanks for submitting a very interesting package. This is the first time I'm serving as an editor for ROpenSci, so I will be learning some parts of the editorial process as I edit your submission. Please let me know if you have any questions throughout.
I have a bit of feedback based on an initial look at the submission and some initial editorial checks.
In your submission summary, could you please:
devtools::spell_check()
:There seem to be two legitimate spelling errors in the package code:
WORD FOUND IN
Assesment load_ramlegacy.Rd:18
availabl ram_dir.Rd:11
covr::package_coverage()
ramlegacy Coverage: 79.33%
R/zzz.R: 23.33%
R/download_ramlegacy.R: 83.02%
R/utils.R: 88.41%
R/load_ramlegacy.R: 88.46%
R/read_ramlegacy.R: 100.00%
R/style.R: 100.00%
goodpractice::gp()
── GP ramlegacy ──────────────
It is good practice to
✖ write unit tests for all functions, and all package code
in general. 80% of code lines are covered by test cases.
R/download_ramlegacy.R:58:NA
R/download_ramlegacy.R:59:NA
R/download_ramlegacy.R:69:NA
R/download_ramlegacy.R:70:NA
R/download_ramlegacy.R:71:NA
... and 35 more lines
✖ avoid long code lines, it is bad for readability. Also,
many people prefer editor windows that are about 80 characters
wide. Try make your lines shorter than 80 characters
R/download_ramlegacy.R:106:1
devtools::check()
In the meantime, I will begin the process of assigning reviewers.
Reviewer: @boshek Reviewer: @jafflerbach Due date: January 25, 2018
@geanders Thanks for your response. I have updated the submission as per your suggestions.
Thank you, @kshtzgupta1!
Could I also ask you to add an rOpenSci review badge to the README file for your package? The full link for this should be:
[![](https://badges.ropensci.org/264_status.svg)](https://github.com/ropensci/software-review/issues/264)
@geanders I have added the badge.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
The vignette can also be viewed by calling vignette(package = "ramlegacy")
Until the installation instruction are for the CRAN version. Alternatively, one could install by removing --no-build-vignettes
from the build_opts
in install_github
library(ramlegacy)
Note that this function does not support vectorization so please don't pass in a vector of version numbers
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
Estimated hours spent reviewing: 5
I think this is a great package that provides a clear set of instruction on how to download the ramlegacy database. The biggest issue in the package I see is the reliance of the library
call to load data. The rest of the items identified below are relatively easy to address. Please let me know if anything is unclear or there is any disagreement.
I think that you are adding too much onto the library
call. Consider any loading of data should be left solely to the load_ramlegacy
function. I think this constrains future development of the package as future may entail functionality outside the database. This is also outside what a user would expect from a typical library
call. We are into stylistic territory here but I personally think objects should not be loaded with the library call.
I wonder if load_ramlegacy
should gain an argument where a user can specify a vector that controls which dataset are loaded. This might be a useful feature for production code, particularly a shiny app.
Consider not even exposing a raw_url
nor ram_path
argument here as described here:
https://github.com/kshtzgupta1/ramlegacy/blob/8c26a196cc573a03574ed084a7e6c9d0b951e7a2/R/download_ramlegacy.R#L23-L25
If a user is never meant to change it, then perhaps the argument could be omitted abstracting that possible confusion away from the user. What is the utility of those arguments if they are never meant to change?
I would strongly consider a NEWS.md file to track changes between versions. rOpenSci provides a great template
I think the package lacks some connection with the actual data. Right now a bunch of dataframes are loaded but a new user is expected to have prior knowledge of the ram database. I think this package could provide some path for users to learn about the data itself. Options include:
Use some of the documentation that is downloaded in the .zip file to clarify (perhaps via internal data) what each table means. Many of the column names aren't terribly descriptive. Somehow ramlegacy should provide a path for users to figure that out.
Perhaps the vignette/README could link to documentation. As it stands, the current documentation is very clear to load some tables in the global environment but this only takes the user partially up the learning curve. ramlegacy should provide users a path to take the next steps.
the vignette to include a very brief sample analysis where two tables are joined. A trivial example would be something like this:
library(ramlegacy)
#> * Multiple versions found including the latest one: 4.3 . Loading the latest version.
#> * Loading version 4.3 ...
#> v Version 4.3 has been successfully loaded.
TC_with_names <- merge(
timeseries_values_views_v4.3[!is.na(timeseries_values_views_v4.3$TC), c("stockid", "TC")],
stock_v4.3[stock_v4.3$region == "West Africa",],
by = "stockid"
)
hist(TC_with_names$TC)
Created on 2019-01-08 by the reprex package (v0.2.1)
It isn't clear to me why you need different versions of the database. This functionality is highlighted quite prominently but it's utility is not clear.
RE: the alternate download location on github. I would consider if you have acknowledged and licensed this data appropriately in your assets repo. I would contact the RAM maintainers directly and seek direction on this. Fully reproducing the data might cause some concern with those maintainers.
The docs directory, _pkgdown.yml
and _config.yml
all needed to be added to .Rbuildignore
for R CMD check to pass on my machine.
I am wondering if this is intentional behaviour. I think this is a carryover from readxl::read_excel
but
given the class of the data object it prints either as a dataframe or as a tibble depending on whether tibble is loaded or not. One thing to consider is even just adding tibble to Suggests for nicer printing in a vignette:
library(ramlegacy)
#> * Multiple versions found including the latest one: 4.3 . Loading the latest version.
#> * Loading version 4.3 ...
#> v Version 4.3 has been successfully loaded.
class(stock_v4.3)
head(stock_v4.3, 10)
library(tibble)
stock_v4.3
<sup>Created on 2019-01-07 by the [reprex package](https://reprex.tidyverse.org) (v0.2.1)</sup>
- I get these errors when I run `goodpractice::gp`
✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd problems. ✖ fix this R CMD check ERROR: Re-running with no redirection of stdout/stderr. Hmm ... looks like a package You may want to clean up by 'rm -Rf C:/Users/salbers/AppData/Local/Temp/Rtmp4yG4PY/Rd2pdf43d827bd526'
- I think some slight improvement could be made by running `styler` on the package though it is marginal.
#### Possible future work
I wonder if it would be useful to engage Sean Anderson to see if he would be willing to submit a pull request with his code that converts the Access database into an sqlite database. I think there would be an appetite for the sqlite option which could exist alongside the Excel version. His process requires an additional utility outside of R but maybe a R-only solution is available.
Thank you for the very thorough review, @boshek! @kshtzgupta1, there will be a second review coming, as well, so you may want to wait for that before you begin addressing reviewer comments.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).When running goodpractice::gp()
I got the following output:
✖ write unit tests for all functions, and all package code in general. 69% of code lines are covered by test
cases.
R/download_ramlegacy.R:53:NA
R/download_ramlegacy.R:58:NA
R/download_ramlegacy.R:59:NA
R/download_ramlegacy.R:68:NA
R/download_ramlegacy.R:69:NA
... and 58 more lines
✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd
problems.
Estimated hours spent reviewing: 4
There is only one version of the RAM database now available on ramlegacy.org, version 4.4. I believe the timing of this package going out for review and the recent website update were around the same time (December 20). That being said, there must be a reason the maintainers of the website no longer host the older versions. I encourage you to reach out to them to understand why. I know them and can make the connection if needed. As it is now, everyone using this package to download older versions will be retrieving the backups on GitHub. I believe this package should only allow retrieval of data that is also available on the site.
Sean Anderson has a package of the same name and same basic functionality at https://github.com/seananderson/ramlegacy. I have used this package for my own work in the past although in recent years I’ve been receiving updated RAM data directly via a DropBox link from the maintainers. I wholeheartedly support development of an R package to access the data such as this one and the one from Sean because the current data distribution methods are unreliable and opaque. That being said, I strongly encourage the developers of this package to work with Sean and the maintainers of the RAM legacy database to get this package functional. Again, I can help facilitate the connection as I’ve worked with Sean closely over the past few years.
As a frequent user of the RAM data, I know where to go to find information about what each of the loaded tables are. I understand the comment from the other reviewer about providing more information, but if this package is purely intended to just retrieve the RAM data from the website, I think these functions are good as is. I can come up with a myriad of other functions I would like to see from this R package based on the common wrangling tasks I do with this database, but I believe those could be added through feature requests and aren’t a reason to hold this package up.
I first tried download_ramlegacy()
when the site was down (1/17/2019). I downloaded from the backup location on github. This worked well. A couple of hours later, the website was backup and when I tried to download the only version now available on the site (4.4) I got an error message
> download_ramlegacy(version = "4.4")
Error: Invalid version number. Available versions are 1.0, 2.0, 2.5, 3.0, 4.3,
The maintainers of this package will have to be monitoring ramlegacy.org carefully to know when a new version is released to then update this package.
I can forsee some user issues with the ram_url()
argument in download_ramlegacy()
:
download_ramlegacy(version = NULL, ram_path = NULL,
ram_url = "https://depts.washington.edu/ramlegac/wordpress/databaseVersions")
First, it looks like there is a typo (ramlegac
instead of ramlegacy
) so if the download_ramlegacy()
function isn't acting as the user would expect (whether it's their fault or not) they might see the URL in the Help document and manually try to fix the typo. This may be an actual typo, I don't know. Either way, if you don't want the user to ever touch it I would hardcode that ram_url
into the function itself rather than having it as an argument.
Also the path
argument in load_ramlegacy()
is again one that should not be touched by the user. I tried to mess with this path argument and when I entered an incorrect path I received this:
> load_ramlegacy(version = 4.3, path = "~/github")
★ Loading version 4.3 ...
Error in readRDS(path) : error reading from connection
In addition: Warning message:
In readRDS(path) : error reading the file
I suggest changing the function so that if something is entered in as a path argument, it returns a message reminding the user to not change the path argument. Or, remove this as an argument entirely.
I just wanted to follow up and it seems within the past week, the maintainers of ramlegacy.org have changed where the database is kept. It is now hosted on Zenodo, which will require some changes to this package for accessing the data.
Thank you to @boshek and @jafflerbach for your very thoughtful reviews!
@kshtzgupta1 : We now have both reviews in, so if you haven't already, you can begin in responding to the reviewers' comments and suggestions. In particular, it sounds like some changes to ram legacy.org, brought up by @jafflerbach, will require some package changes.
Please let me know if you have any questions as you prepare your response.
Also worth noting that the authors are now including (or maybe always had) an .RData file will and an R script to load the database into a users session much like what this package does albeit in a less refined way.
@boshek : Just to clarify, you're referring to the "DBdata.RData" and "loadDBdata (assessment data only).r" files listed here, right?
Thanks everyone for great reviews and discussion, this is very helpful. @kshtzgupta1 and I will need a bit more time to really work through and address all of these properly, but meanwhile will just make a few quick observations relevant to the immediate discussion.
@jafflerbach Thanks! Yes, we've been in discussion with Mike Melnychuk since September about moving the data to Zenodo to avoid the problems created by down-time and re-organization of the ramlegacy.org website, so we were very happy to hear from them about this development and look forward to using those endpoints for the package. While they have plans to upload older versions to Zenodo as well, these aren't currently linked up as related versions (a la https://blog.zenodo.org/2017/05/30/doi-versioning-launched/), hopefully they will be able to do that to provide a source for the original versions. So hopefully soon the Zenodo hosting will provide a much more stable source for any versions of the data but may be a bit of a moving target for a little longer. (This switch to the versioning approach would also allow automated resolution of the most recent version).
@geanders 👍
@kshtzgupta1 : It's no problem that it will take a bit longer than for your response than we usually ask for, especially given some of the changes in ramlegacy.org that affect the package. I was wondering, though, if you have a rough timeline to share with the reviewers? Just a rough idea would be fine.
@geanders : Apologies for replying so late. We are working to incorporate some of the changes suggested by the reviewers as well as improve the package to better deal with the migration of the latest versions to Zenodo. I will be meeting with Prof. Boettiger this Thursday to finalize those changes after which I will submit the pull request addressing each of the reviewers' comments.
@kshtzgupta1 : No problem at all! Thanks for the update.
@geanders @boshek @jafflerbach Thanks for these very helpful and detailed reviews! I believe Prof. Boettiger and I have addressed most of these issues now. Among other things, we have removed the loading behavior of library(ramlegacy)
leaving the loading solely to load_ramlegacy
which instead of assigning all the tables to global environment now just returns a list of the tables. Additionally, if the user wants specific tables from a version of the database load_ramlegacy
now supports that behavior. We have also modified download_ramlegacy and load_ramlegacy so that while the default is still to download and read in the dataframes from the rappdirs directory the functions now also support downloading to a location chosen by the user and reading from that location.
As you may know the RAM maintainers have moved the database to Zenodo where they have released three new versions 4.40, 4.41, and 4.44. Unfortunately, they still haven't transferred the older versions (1.0, 2.0, 2.5, 3.0, 4.3) to Zenodo and so for now the package is going to have to make these versions available to the user through the github repo. We will change this of course as soon as the maintainers start hosting the older versions on Zenodo. Finally, the test coverage might have become lower with the latest commit as many of the old tests have become obsolete but as soon as these new changes are confirmed I will write new tests to bring up the test coverage.
I've made all these edits and more in a separate branch and created a PR, kshtzgupta1/ramlegacy#1 which I will leave un-merged for now, hopefully that provides an easy way to see the differences. I've also summarized my replies to each of your point-by-point reviews (with a few clarifying questions and comments) in the two issues linked in that PR.
Nice work @kshtzgupta1. @geanders I won't be able to look at this until late next week but can prioritize it then.
@kshtzgupta1 : Thanks so much for these revisions! Could you insert a link in your comment to the Issues where you give point-by-point reviews? That would help @boshek and @jafflerbach find them more easily.
@boshek : Thanks for letting me know your timeline. That works fine.
Actually, @kshtzgupta1, could I ask you to copy your point-by-point responses over to this thread? It would be great to keep the full conversation on a single thread.
@boshek Thank you for a detailed review! We have made many changes following your suggestions. I will update the README and vignette to reflect those changes after they are approved.
- Though this is likely to end up on CRAN, the current installation instructions don't automatically build the vignette so accessing the vignette locally fails. I'd just suggest remove the line:
The vignette can also be viewed by calling vignette(package = "ramlegacy")
Until the installation instruction are for the CRAN version. Alternatively, one could install by removing --no-build-vignettes from the
build_opts
ininstall_github
Fixed that line!
Vignette(s)** demonstrating major functionality that runs successfully locally
- Use quoting markdown for RAM legacy description
- This line: https://github.com/kshtzgupta1/ramlegacy/blob/8c26a196cc573a03574ed084a7e6c9d0b951e7a2/vignettes/ramlegacy.Rmd#L37 I believe is a typo and should read
library(ramlegacy)
Fixed!
Function Documentation: for all exported functions in R help
Note that this function does not support vectorization so please don't pass in a vector of version numbers
- Consider revising the above since you are unable to do so anyways
Revised!
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).
- Consider adding either suggested reference or the maintainer of ramlegacy to the DESCRIPTION file in Authors@R
- Consider adding a CODE OF CONDUCT and CONTRIBUTING file directly to the repo.
Added!
- I think that you are adding too much onto the
library
call. Consider any loading of data should be left solely to theload_ramlegacy
function. I think this constrains future development of the package as future may entail functionality outside the database. This is also outside what a user would expect from a typicallibrary
call. We are into stylistic territory here but I personally think objects should not be loaded with the library call.
Agreed! We have removed the loading behavior of library(ramlegacy)
and all loading is now solely done
by load_ramlegacy
- I wonder if
load_ramlegacy
should gain an argument where a user can specify a vector that controls which dataset are loaded. This might be a useful feature for production code, particularly a shiny app.
Excellent suggestion! We have added a dfs
argument in load_ramlegacy
to which the user can pass a vector of dataframes to load.
- Consider not even exposing a
raw_url
norram_path
argument here as described here: https://github.com/kshtzgupta1/ramlegacy/blob/8c26a196cc573a03574ed084a7e6c9d0b951e7a2/R/download_ramlegacy.R#L23-L25 If a user is never meant to change it, then perhaps the argument could be omitted abstracting that possible confusion away from the user. What is the utility of those arguments if they are never meant to change?
Having ram_url
as an argument helps us write tests in which download_ramlegacy
deal with internet connection issues. Although the package downloads and caches the database in the user's rappdirs directory by default we have now decided that having ram_path
as an argument is useful in download_ramlegacy
and load_ramlegacy
because it allows the user to download the database to a location of their choice and read from that location if they wish to do so. I think we want to give the user that flexibility.
- I would strongly consider a NEWS.md file to track changes between versions. rOpenSci provides a great template
Added!
- I think the package lacks some connection with the actual data. Right now a bunch of dataframes are loaded but a new user is expected to have prior knowledge of the ram database. I think this package could provide some path for users to learn about the data itself.
We will definitely be open to including more educating vignettes if the maintainers of the database wanted to do that and certainly be willing to link more informative documentation in the vignette but at the same time we want to avoid writing things that are out-of-date or not coming from the maintainers. Also, in our opinion educating a new user in the database might be a bit outside the scope of the package which was primarily to improve access to the data.
- It isn't clear to me why you need different versions of the database. This functionality is highlighted quite prominently but it's utility is not clear.
We believe providing access to older versions can be really useful for users trying to reproduce older research papers and studies.
- RE: the alternate download location on github. I would consider if you have acknowledged and licensed this data appropriately in your assets repo. I would contact the RAM maintainers directly and seek direction on this. Fully reproducing the data might cause some concern with those maintainers.
Prof. Boettiger was in touch with the RAM maintainers regarding that. While the maintainers have moved the latest versions (4.40, 4.41, 4.44) to Zenodo they still have to do the same for the older versions. So till that happens the package will have to use the github repo to make the older versions available to the users.
- The docs directory,
_pkgdown.yml
and_config.yml
all needed to be added to.Rbuildignore
for R CMD check to pass on my machine.
Added to .Rbuildignore
!
- I am wondering if this is intentional behaviour. I think this is a carryover from
readxl::read_excel
but given the class of the data object it prints either as a dataframe or as a tibble depending on whether tibble is loaded or not. One thing to consider is even just adding tibble to Suggests for nicer printing in a vignette
It was intentional. We wanted to give the user the option to choose whether they wanted the tables as tibbles or dataframe. If for some reason a user prefers data.frame
to tibble
then we certainly don't want to impose tibbles on them.
- I get these errors when I run
goodpractice::gp
✖ fix this R CMD check WARNING: LaTeX errors when creating PDF
version. This typically indicates Rd problems.
✖ fix this R CMD check ERROR: Re-running with no redirection of
stdout/stderr. Hmm ... looks like a package You may want to clean
up by 'rm -Rf
C:/Users/salbers/AppData/Local/Temp/Rtmp4yG4PY/Rd2pdf43d827bd526'
We couldn't reproduce these on our machines. I think they might be specific to your machine and may be occurring because you don't have Tex installed.
I think some slight improvement could be made by running
styler
on the package though it is marginal.
Agreed! I will run it after all the changes have been approved.
I wonder if it would be useful to engage Sean Anderson to see if he would be willing to submit a pull request with his code that converts the Access database into an sqlite database. I think there would be an appetite for the sqlite option which could exist alongside the Excel version. His process requires an additional utility outside of R but maybe a R-only solution is available.
We are definitely open to working with Sean Anderson. I think Prof. Boettiger has pinged him.
@jafflerbach Thank you for a detailed review! We have made many changes following your suggestions. I will update the README and vignette to reflect those changes after they are approved.
Vignette(s) demonstrating major functionality that runs successfully locally The Vignette is good except it does not explain how to install the package. You can only find installation instructions in the README
Fixed!
- When running
goodpractice::gp()
I got the following output:
✖ write unit tests for all functions, and all package code in general. 69% of code lines are covered by test
cases.
R/download_ramlegacy.R:53:NA
R/download_ramlegacy.R:58:NA
R/download_ramlegacy.R:59:NA
R/download_ramlegacy.R:68:NA
R/download_ramlegacy.R:69:NA
... and 58 more lines
✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd
problems.
We couldn't reproduce that warning on our machines. I think it might be specific to your machine and may be occurring because you don't have Tex installed.
That being said, I strongly encourage the developers of this package to work with Sean and the maintainers of the RAM legacy database to get this package functional.
I believe Prof. Boettiger has reached out to Sean regarding that.
I first tried
download_ramlegacy()
when the site was down (1/17/2019). I downloaded from the backup location on github. This worked well. A couple of hours later, the website was backup and when I tried to download the only version now available on the site (4.4) I got an error message
> download_ramlegacy(version = "4.4")
Error: Invalid version number. Available versions are 1.0, 2.0, 2.5, 3.0, 4.3,
This should be resolved now!
I can forsee some user issues with the
ram_url()
argument indownload_ramlegacy()
:
download_ramlegacy(version = NULL, ram_path = NULL,
ram_url = "https://depts.washington.edu/ramlegac/wordpress/databaseVersions")
First, it looks like there is a typo (
ramlegac
instead oframlegacy
) so if thedownload_ramlegacy()
function isn't acting as the user would expect (whether it's their fault or not) they might see the URL in the Help document and manually try to fix the typo. This may be an actual typo, I don't know. Either way, if you don't want the user to ever touch it I would hardcode thatram_url
into the function itself rather than having it as an argument.
It was actually ramlegac
in the url. But that is no longer relevant since we are now passing in the zenodo doi url to ram_url
. Having ram_url
as an argument helps us test for intended behavior of download_ramlegacy
in face of internet connection issues and network problems.
Also the
path
argument inload_ramlegacy()
is again one that should not be touched by the user. I tried to mess with this path argument and when I entered an incorrect path I received this:
> load_ramlegacy(version = 4.3, path = "~/github")
★ Loading version 4.3 ...
Error in readRDS(path) : error reading from connection
In addition: Warning message:
In readRDS(path) : error reading the file
I suggest changing the function so that if something is entered in as a path argument, it returns a message reminding the user to not change the path argument. Or, remove this as an argument entirely.
Although the package downloads and caches the database in the user's rappdirs directory by default we have now decided that having ram_path
as an argument is useful in download_ramlegacy
and load_ramlegacy
because it allows the user to download the database to a location of their choice and read from that location if they wish to do so. I think we want to give the user that flexibility.
Great job @kshtzgupta1! Thanks for such a great package and great job addressing issues raised in this review. You have sufficiently addressed any issues I raised. @geanders I recommend this package for acceptance. I do want to raise some final considerations:
It was intentional. We wanted to give the user the option to choose whether they wanted the tables as tibbles or dataframe. If for some reason a user prefers
data.frame
totibble
then we certainly don't want to impose tibbles on them.
I am really of two minds on this. I employ this same trick on a couple packages of mine but I also wonder if this introduces some weird ambiguity that relies on the side effect of having a package loaded. Though I don't think the package acceptance is contingent on this, I would in this case recommend imposing a tibble on the user.
Prof. Boettiger was in touch with the RAM maintainers regarding that. While the maintainers have moved the latest versions (4.40, 4.41, 4.44) to Zenodo they still have to do the same for the older versions. So till that happens the package will have to use the github repo to make the older versions available to the users.
I still don't love this solution. To me, this places too much of the package functionality at the mercy of that repo being available rather having a direct line to the data. I think that it is worth the sacrifice of having fewer versions available to drop this solution. That said, I similarly don't think this is sufficient to hold up acceptance.
Hi Sam @boshek , thanks for the excellent review and the kinds words! I don't mean to jump in here but just wanted to add a little context to the decision about accessing older versions. I agree that it would be much more desirable to have all versions on Zenodo, and based on our discussions with them, I think the RAM Legacy team will eventually post those, but it is hard to know exactly when. Meanwhile, I do believe access to the old versions is critical, all the more so for them not being available anywhere else now. Among other reasons, there are dozens of high profile papers based on these older versions; just today there is yet another appearing in Science that uses the version 3 data.
Great context @cboettig . This rationale makes perfect sense.
@jafflerbach : I wanted to check in with you to see if you had any thoughts on the response to your initial review for this package?
Also just a note that we did just hear back from Sean Anderson who was very gracious and positive about the package. He gave us a few suggestions (data tables no longer append the version name on the table, which was no longer necesary anyway with new approach of explicitly calling load_ramlegacy()
with a version number instead of trying to load automatically on library()
. (Also note that as Sean's package no longer works in it's current state, as it still points to the Wordpress URLs which are now defunct and would need to be replaced with the Zenodo DOIs)
Thanks Sam, Jamie & Brooke for your feedback so far! @kshtzgupta1 summarizes the edits in reply to Jamie above, these are waiting on an PR from an ropensci-review branch in the package meanwhile.
Thanks @geanders, @kshtzgupta1 and @cboettig. Good hear you've been in touch with Sean Anderson as well as the RAM maintainers. No further requests from me on this package now.
Approved! @kshtzgupta1 : Thanks so much for these thoughtful revisions. Both reviewers agree that the package should be accepted. Further, based on there reviews, there are a few potential changes you might want to consider in future revisions (e.g., removing the reliance on the GitHub repos of older database versions once this is possible, thinking some more about whether you want different behavior based on whether the user has tibble
loaded) or through other avenues (e.g., posting some examples of using the downloaded data from the package, perhaps as a blog post, to help new users understand how this package can fit into a pipeline of analysis). Thanks to both @boshek and @jafflerbach for very helpful input and suggestions.
There are some things we'll need to do for the final processing of this package. I'm including the standard to-dos as a check list here. I'd like to add the caveat that this will be my first time doing the editor-side part of this process, so please bear with me as I figure out my end of things for the process!
To-dos:
[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
"[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname)
.Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"
-type contributors in the Authors@R
field (with their consent). More info on this here.
Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing.
We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.
Also, since you are planning to submit to CRAN, you may find this list of CRAN gotchas helpful, and I'd be happy to provide support through that process.
Hello @kshtzgupta1 I'm rOpenSci's Community Manager, here to say we would love to feature a post about ramlegacy
on our blog. This might bring your work to a larger audience.
This link will give you many examples of blog posts by authors of peer-reviewed packages so you can get an idea of the style and length you prefer: https://ropensci.org/tags/software-peer-review/. Shorter tech notes are here: https://ropensci.org/technotes/
Here are some technical and editorial guidelines for contributing a blog post: https://github.com/ropensci/roweb2#contributing-a-blog-post.
Please let me know if you're interested and we can discuss a deadline. Happy to answer any questions.
My apologies for such a late reply @geanders. I was ill and needed to have a surgery. I have finished all the to-dos. Thank you for being such a helpful editor!
@boshek Thank you for being such a great reviewer! Is it fine if I add you as a "rev"-type contributor in the Authors@R field? If so, can I put in your email as sam.albers@gmail.com and your orcid id as https://orcid.org/0000-0002-9270-7884 ?
@jafflerbach Thank you very much for reviewing the package! I would like to acknowledge you as a "rev"-type contributor in the Authors@R field if that's okay with you. Can I put in your email as afflerbach@nceas.ucsb.edu and orcid id as https://orcid.org/0000-0002-5215-9342 ?
@stefaniebutland Thank you so much for reaching out. My sincere apologies for the late reply. I think Prof. Boettiger and I would definitely be interested in authoring a blog post about ramlegacy. I'll discuss the content of the post with Prof. Boettiger and let you know about a timeline very soon.
Yep both of those are correct.
Jamie Afflerbach Marine Data Scientist National Center for Ecological Analysis and Synthesis (NCEAS https://www.nceas.ucsb.edu/) University of California, Santa Barbara website http://jamieafflerbach.com - github https://github.com/jafflerbach- twitter https://twitter.com/jafflerbach
On Wed, Apr 3, 2019 at 1:39 PM Kshitiz Gupta notifications@github.com wrote:
@jafflerbach https://github.com/jafflerbach Thank you very much for reviewing the package! I would like to acknowledge you as a "rev"-type contributor in the Authors@R field if that's okay with you. Can I put in your email as afflerbach@nceas.ucsb.edu and orcid id as https://orcid.org/0000-0002-5215-9342 ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/software-review/issues/264#issuecomment-479649853, or mute the thread https://github.com/notifications/unsubscribe-auth/AGhojjYhjkc8-ulVxEPQIbg29n4-vmWDks5vdRF4gaJpZM4Ysexu .
@kshtzgupta1 👍
@kshtzgupta1 : No worries at all! Alright, I will work through the next steps on my end. Again, since this is the first package I've edited to reach this point, it might take me a bit longer than usual to make sure I've gotten through any required steps. Congrats on a very nice package!
@kshtzgupta1 Please look for an email from me to follow up about a blog post
@kshtzgupta1 : I think we are all set with our post-approval steps, so I'm going to close this issue. Congrats on a really nice package!
Summary
ramlegacy
helps users access the excel version of the RAM Legacy Stock Assessment Database from www.ramlegacy.org. Data is freely available from the databse website, but downloading and reading in the data by hand can be time-consuming.ramlegacy
is capable of downloading, reading and caching all the available versions of the database.URL for the package (the development repository, not a stylized html page):
www.github.com/kshtzgupta1/ramlegacy
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
data retrieval:
ramlegacy
downloads and reads in multiple versions of the excel version of the RAM Legacy Stock Assessment Database from the database's website.Any one who needs access to RAM Legacy Stock Assessment Database: Fisheries Biologists, Conservationists, Students, Teachers, etc.
Sean Anderson has a namesake package not published to CRAN and it appears to be a stalled project on GitHub (last updated 9 months ago). However, unlike this package which supports downloading and reading in the Excel version of the database, Sean Anderson's project downloads the Microsoft Access copy of the database and converts it to a local sqlite3 database.
There is also
RAMlegacyr
, an older package last updated in 2015. Similar to Sean Anderson's project, the package seems to be an R interface only for the Microsoft Access version of the RAM Legacy Stock Assessment Database and provides a set of functions using RPostgreSQL to connect to the database.Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:Please note that running R CMD check on ramlegacy may result in the following warning if the user has not yet downloaded a version of the RAM Legacy Stock Assessment Database.
No version of the database has yet been downloaded. Use function download_ramlegacy() to download a version now.
Also note that R CMD check on this package will result in the following error in the testing suite if the user is not online when checking the package.
Error: Could not connect to the internet. Please check your connection settings and try again.
[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names: