ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Submission: ramlegacy #264

Closed kshtzgupta1 closed 5 years ago

kshtzgupta1 commented 5 years ago

Summary

ramlegacy helps users access the excel version of the RAM Legacy Stock Assessment Database from www.ramlegacy.org. Data is freely available from the databse website, but downloading and reading in the data by hand can be time-consuming. ramlegacy is capable of downloading, reading and caching all the available versions of the database.

Type: Package
Package: ramlegacy
Title: Download and Read RAM Legacy Stock Assessment DataBase
Version: 0.1.0
Authors@R: 
    c(person(given = "Kshitiz",
             family = "Gupta",
             role = c("aut", "cre", "cph"),
             email = "kshtzgupta1@berkeley.edu"),
      person(given = "Carl",
             family = "Boettiger",
             role = c("aut", "cph"),
             email = "cboettig@gmail.com", comment="http://orcid.org/0000-0002-1642-628X"))
Description: Contains functions to download, cache and read in Excel version of
    the RAM Legacy Stock Assessment Data Base, an online compilation of
    stock assessment results for commercially exploited marine populations
    from around the world. More information about the database can be
    found at <http://ramlegacy.org/>.
License: MIT + file LICENSE
URL: https://github.com/kshtzgupta1/ramlegacy
BugReports: https://github.com/kshtzgupta1/ramlegacy/issues
Depends: 
    R (>= 2.10)
Imports: 
    cli (>= 1.0.0),
    crayon (>= 1.3.4),
    httr (>= 1.3.1),
    rappdirs (>= 0.3.1),
    readxl (>= 1.1.0)
Suggests:
    covr,
    testthat,
    httptest,
    knitr,
    rmarkdown
VignetteBuilder: 
    knitr
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.0
X-schema.org-keywords: ramlegacy, marine, database, assessment, RAM, Legacy, download, read, cache, ropensci

data retrieval: ramlegacy downloads and reads in multiple versions of the excel version of the RAM Legacy Stock Assessment Database from the database's website.

Any one who needs access to RAM Legacy Stock Assessment Database: Fisheries Biologists, Conservationists, Students, Teachers, etc.

Sean Anderson has a namesake package not published to CRAN and it appears to be a stalled project on GitHub (last updated 9 months ago). However, unlike this package which supports downloading and reading in the Excel version of the database, Sean Anderson's project downloads the Microsoft Access copy of the database and converts it to a local sqlite3 database.

There is also RAMlegacyr, an older package last updated in 2015. Similar to Sean Anderson's project, the package seems to be an R interface only for the Microsoft Access version of the RAM Legacy Stock Assessment Database and provides a set of functions using RPostgreSQL to connect to the database.

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

Please note that running R CMD check on ramlegacy may result in the following warning if the user has not yet downloaded a version of the RAM Legacy Stock Assessment Database.

No version of the database has yet been downloaded. Use function download_ramlegacy() to download a version now.

Also note that R CMD check on this package will result in the following error in the testing suite if the user is not online when checking the package.

Error: Could not connect to the internet. Please check your connection settings and try again.

sckott commented 5 years ago

thanks for your submission @kshtzgupta1 We are discussing now and will get back to you soon

sckott commented 5 years ago

sorry for delay on this @kshtzgupta1 - we have some conflicts of interest we're thinking through ...

geanders commented 5 years ago

Editor checks:


Editor comments

@kshtzgupta1: Thanks for submitting a very interesting package. This is the first time I'm serving as an editor for ROpenSci, so I will be learning some parts of the editorial process as I edit your submission. Please let me know if you have any questions throughout.

I have a bit of feedback based on an initial look at the submission and some initial editorial checks.

Submission summary:

In your submission summary, could you please:

Checks from devtools::spell_check():

There seem to be two legitimate spelling errors in the package code:

  WORD            FOUND IN
Assesment       load_ramlegacy.Rd:18
availabl        ram_dir.Rd:11

Checks from covr::package_coverage()

ramlegacy Coverage: 79.33%
R/zzz.R: 23.33%
R/download_ramlegacy.R: 83.02%
R/utils.R: 88.41%
R/load_ramlegacy.R: 88.46%
R/read_ramlegacy.R: 100.00%
R/style.R: 100.00%

Checks from goodpractice::gp()

── GP ramlegacy ──────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 80% of code lines are covered by test cases.

    R/download_ramlegacy.R:58:NA
    R/download_ramlegacy.R:59:NA
    R/download_ramlegacy.R:69:NA
    R/download_ramlegacy.R:70:NA
    R/download_ramlegacy.R:71:NA
    ... and 35 more lines

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/download_ramlegacy.R:106:1

Checks from devtools::check()

Other

In the meantime, I will begin the process of assigning reviewers.

Reviewer: @boshek Reviewer: @jafflerbach Due date: January 25, 2018

kshtzgupta1 commented 5 years ago

@geanders Thanks for your response. I have updated the submission as per your suggestions.

geanders commented 5 years ago

Thank you, @kshtzgupta1!

Could I also ask you to add an rOpenSci review badge to the README file for your package? The full link for this should be:

[![](https://badges.ropensci.org/264_status.svg)](https://github.com/ropensci/software-review/issues/264)
kshtzgupta1 commented 5 years ago

@geanders I have added the badge.

boshek commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 5


Review Comments

Summary

I think this is a great package that provides a clear set of instruction on how to download the ramlegacy database. The biggest issue in the package I see is the reliance of the library call to load data. The rest of the items identified below are relatively easy to address. Please let me know if anything is unclear or there is any disagreement.

library(ramlegacy)
#> * Multiple versions found including the latest one: 4.3 . Loading the latest version.
#> * Loading version 4.3 ...
#> v Version 4.3 has been successfully loaded.

TC_with_names <- merge(
  timeseries_values_views_v4.3[!is.na(timeseries_values_views_v4.3$TC), c("stockid", "TC")],
  stock_v4.3[stock_v4.3$region == "West Africa",],
  by = "stockid"
)

hist(TC_with_names$TC)

Created on 2019-01-08 by the reprex package (v0.2.1)

class(stock_v4.3)

> [1] "tbl_df" "tbl" "data.frame"

head(stock_v4.3, 10)

> stockid tsn scientificname commonname

> 1 ACADRED2J3K 166774 Sebastes fasciatus Acadian redfish

> 2 ACADRED3LNO-UT12 166774 Sebastes fasciatus Acadian redfish

> 3 ACADREDGOMGB 166774 Sebastes fasciatus Acadian redfish

> 4 ACADREDUT3 166774 Sebastes fasciatus Acadian redfish

> 5 ACMACKSARG 172413 Scomber colias Argentine chub mackerel

> 6 AFLONCH 166156 Beryx splendens Alfonsino

> 7 ALBAIO 172419 Thunnus alalunga albacore tuna

> 8 ALBAMED 172419 Thunnus alalunga albacore tuna

> 9 ALBANATL 172419 Thunnus alalunga Albacore tuna

> 10 ALBANPAC 172419 Thunnus alalunga Albacore tuna

> areaid stocklong

> 1 Canada-DFO-2J3K Acadian redfish NAFO-2J3K

> 2 Canada-DFO-3LNO-UT12 Acadian redfish Units 1-2 and NAFO-3LNO

> 3 USA-NMFS-5YZ Acadian redfish Gulf of Maine / Georges Bank

> 4 Canada-DFO-UT3 Acadian redfish Unit 3

> 5 Argentina-CFP-ARG-S Argentine chub mackerel Southern Argentina

> 6 multinational-SPRFMO-CH Alfonsino Chile

> 7 multinational-IOTC-IO Albacore tuna Indian Ocean

> 8 multinational-ICCAT-MED Albacore tuna Mediterranean

> 9 multinational-ICCAT-NATL Albacore tuna North Atlantic

> 10 Multinational-ISC-NPAC Albacore tuna North Pacific

> region inmyersdb myersstockid

> 1 Canada East Coast 0

> 2 Canada East Coast 0

> 3 US East Coast 0

> 4 Canada East Coast 0

> 5 South America 0

> 6 South America 0

> 7 Indian Ocean 0

> 8 Mediterranean-Black Sea 0

> 9 Atlantic Ocean 0

> 10 US West Coast 0

library(tibble)

stock_v4.3

> # A tibble: 1,294 x 9

> stockid tsn scientificname commonname areaid stocklong region

>

> 1 ACADRE~ 166774 Sebastes fasc~ Acadian r~ Canad~ Acadian ~ Canad~

> 2 ACADRE~ 166774 Sebastes fasc~ Acadian r~ Canad~ Acadian ~ Canad~

> 3 ACADRE~ 166774 Sebastes fasc~ Acadian r~ USA-N~ Acadian ~ US Ea~

> 4 ACADRE~ 166774 Sebastes fasc~ Acadian r~ Canad~ Acadian ~ Canad~

> 5 ACMACK~ 172413 Scomber colias Argentine~ Argen~ Argentin~ South~

> 6 AFLONCH 166156 Beryx splende~ Alfonsino multi~ Alfonsin~ South~

> 7 ALBAIO 172419 Thunnus alalu~ albacore ~ multi~ Albacore~ India~

> 8 ALBAMED 172419 Thunnus alalu~ albacore ~ multi~ Albacore~ Medit~

> 9 ALBANA~ 172419 Thunnus alalu~ Albacore ~ multi~ Albacore~ Atlan~

> 10 ALBANP~ 172419 Thunnus alalu~ Albacore ~ Multi~ Albacore~ US We~

> # ... with 1,284 more rows, and 2 more variables: inmyersdb ,

> # myersstockid


<sup>Created on 2019-01-07 by the [reprex package](https://reprex.tidyverse.org) (v0.2.1)</sup>

- I get these errors when I run `goodpractice::gp`

✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd problems. ✖ fix this R CMD check ERROR: Re-running with no redirection of stdout/stderr. Hmm ... looks like a package You may want to clean up by 'rm -Rf C:/Users/salbers/AppData/Local/Temp/Rtmp4yG4PY/Rd2pdf43d827bd526'



- I think some slight improvement could be made by running `styler` on the package though it is marginal. 

#### Possible future work
I wonder if it would be useful to engage Sean Anderson to see if he would be willing to submit a pull request with his code that converts the Access database into an sqlite database. I think there would be an appetite for the sqlite option which could exist alongside the Excel version. His process requires an additional utility outside of R but maybe a R-only solution is available. 
geanders commented 5 years ago

Thank you for the very thorough review, @boshek! @kshtzgupta1, there will be a second review coming, as well, so you may want to wait for that before you begin addressing reviewer comments.

jamiecmontgomery commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 4


General Comments

There is only one version of the RAM database now available on ramlegacy.org, version 4.4. I believe the timing of this package going out for review and the recent website update were around the same time (December 20). That being said, there must be a reason the maintainers of the website no longer host the older versions. I encourage you to reach out to them to understand why. I know them and can make the connection if needed. As it is now, everyone using this package to download older versions will be retrieving the backups on GitHub. I believe this package should only allow retrieval of data that is also available on the site.

Sean Anderson has a package of the same name and same basic functionality at https://github.com/seananderson/ramlegacy. I have used this package for my own work in the past although in recent years I’ve been receiving updated RAM data directly via a DropBox link from the maintainers. I wholeheartedly support development of an R package to access the data such as this one and the one from Sean because the current data distribution methods are unreliable and opaque. That being said, I strongly encourage the developers of this package to work with Sean and the maintainers of the RAM legacy database to get this package functional. Again, I can help facilitate the connection as I’ve worked with Sean closely over the past few years.

As a frequent user of the RAM data, I know where to go to find information about what each of the loaded tables are. I understand the comment from the other reviewer about providing more information, but if this package is purely intended to just retrieve the RAM data from the website, I think these functions are good as is. I can come up with a myriad of other functions I would like to see from this R package based on the common wrangling tasks I do with this database, but I believe those could be added through feature requests and aren’t a reason to hold this package up.

More specific comments

I first tried download_ramlegacy() when the site was down (1/17/2019). I downloaded from the backup location on github. This worked well. A couple of hours later, the website was backup and when I tried to download the only version now available on the site (4.4) I got an error message

> download_ramlegacy(version = "4.4")
Error: Invalid version number. Available versions are 1.0, 2.0, 2.5, 3.0, 4.3,

The maintainers of this package will have to be monitoring ramlegacy.org carefully to know when a new version is released to then update this package.

I can forsee some user issues with the ram_url() argument in download_ramlegacy():

download_ramlegacy(version = NULL, ram_path = NULL,
  ram_url = "https://depts.washington.edu/ramlegac/wordpress/databaseVersions")

First, it looks like there is a typo (ramlegac instead of ramlegacy) so if the download_ramlegacy() function isn't acting as the user would expect (whether it's their fault or not) they might see the URL in the Help document and manually try to fix the typo. This may be an actual typo, I don't know. Either way, if you don't want the user to ever touch it I would hardcode that ram_url into the function itself rather than having it as an argument.

Also the path argument in load_ramlegacy() is again one that should not be touched by the user. I tried to mess with this path argument and when I entered an incorrect path I received this:

> load_ramlegacy(version = 4.3, path = "~/github")
★ Loading version 4.3 ...
Error in readRDS(path) : error reading from connection
In addition: Warning message:
In readRDS(path) : error reading the file

I suggest changing the function so that if something is entered in as a path argument, it returns a message reminding the user to not change the path argument. Or, remove this as an argument entirely.

jamiecmontgomery commented 5 years ago

I just wanted to follow up and it seems within the past week, the maintainers of ramlegacy.org have changed where the database is kept. It is now hosted on Zenodo, which will require some changes to this package for accessing the data.

geanders commented 5 years ago

Thank you to @boshek and @jafflerbach for your very thoughtful reviews!

@kshtzgupta1 : We now have both reviews in, so if you haven't already, you can begin in responding to the reviewers' comments and suggestions. In particular, it sounds like some changes to ram legacy.org, brought up by @jafflerbach, will require some package changes.

Please let me know if you have any questions as you prepare your response.

boshek commented 5 years ago

Also worth noting that the authors are now including (or maybe always had) an .RData file will and an R script to load the database into a users session much like what this package does albeit in a less refined way.

geanders commented 5 years ago

@boshek : Just to clarify, you're referring to the "DBdata.RData" and "loadDBdata (assessment data only).r" files listed here, right?

cboettig commented 5 years ago

Thanks everyone for great reviews and discussion, this is very helpful. @kshtzgupta1 and I will need a bit more time to really work through and address all of these properly, but meanwhile will just make a few quick observations relevant to the immediate discussion.

@jafflerbach Thanks! Yes, we've been in discussion with Mike Melnychuk since September about moving the data to Zenodo to avoid the problems created by down-time and re-organization of the ramlegacy.org website, so we were very happy to hear from them about this development and look forward to using those endpoints for the package. While they have plans to upload older versions to Zenodo as well, these aren't currently linked up as related versions (a la https://blog.zenodo.org/2017/05/30/doi-versioning-launched/), hopefully they will be able to do that to provide a source for the original versions. So hopefully soon the Zenodo hosting will provide a much more stable source for any versions of the data but may be a bit of a moving target for a little longer. (This switch to the versioning approach would also allow automated resolution of the most recent version).

boshek commented 5 years ago

@geanders 👍

geanders commented 5 years ago

@kshtzgupta1 : It's no problem that it will take a bit longer than for your response than we usually ask for, especially given some of the changes in ramlegacy.org that affect the package. I was wondering, though, if you have a rough timeline to share with the reviewers? Just a rough idea would be fine.

kshtzgupta1 commented 5 years ago

@geanders : Apologies for replying so late. We are working to incorporate some of the changes suggested by the reviewers as well as improve the package to better deal with the migration of the latest versions to Zenodo. I will be meeting with Prof. Boettiger this Thursday to finalize those changes after which I will submit the pull request addressing each of the reviewers' comments.

geanders commented 5 years ago

@kshtzgupta1 : No problem at all! Thanks for the update.

kshtzgupta1 commented 5 years ago

@geanders @boshek @jafflerbach Thanks for these very helpful and detailed reviews! I believe Prof. Boettiger and I have addressed most of these issues now. Among other things, we have removed the loading behavior of library(ramlegacy) leaving the loading solely to load_ramlegacy which instead of assigning all the tables to global environment now just returns a list of the tables. Additionally, if the user wants specific tables from a version of the database load_ramlegacy now supports that behavior. We have also modified download_ramlegacy and load_ramlegacy so that while the default is still to download and read in the dataframes from the rappdirs directory the functions now also support downloading to a location chosen by the user and reading from that location.

As you may know the RAM maintainers have moved the database to Zenodo where they have released three new versions 4.40, 4.41, and 4.44. Unfortunately, they still haven't transferred the older versions (1.0, 2.0, 2.5, 3.0, 4.3) to Zenodo and so for now the package is going to have to make these versions available to the user through the github repo. We will change this of course as soon as the maintainers start hosting the older versions on Zenodo. Finally, the test coverage might have become lower with the latest commit as many of the old tests have become obsolete but as soon as these new changes are confirmed I will write new tests to bring up the test coverage.

I've made all these edits and more in a separate branch and created a PR, kshtzgupta1/ramlegacy#1 which I will leave un-merged for now, hopefully that provides an easy way to see the differences. I've also summarized my replies to each of your point-by-point reviews (with a few clarifying questions and comments) in the two issues linked in that PR.

boshek commented 5 years ago

Nice work @kshtzgupta1. @geanders I won't be able to look at this until late next week but can prioritize it then.

geanders commented 5 years ago

@kshtzgupta1 : Thanks so much for these revisions! Could you insert a link in your comment to the Issues where you give point-by-point reviews? That would help @boshek and @jafflerbach find them more easily.

@boshek : Thanks for letting me know your timeline. That works fine.

geanders commented 5 years ago

Actually, @kshtzgupta1, could I ask you to copy your point-by-point responses over to this thread? It would be great to keep the full conversation on a single thread.

kshtzgupta1 commented 5 years ago

@boshek Thank you for a detailed review! We have made many changes following your suggestions. I will update the README and vignette to reflect those changes after they are approved.

  • Though this is likely to end up on CRAN, the current installation instructions don't automatically build the vignette so accessing the vignette locally fails. I'd just suggest remove the line:

The vignette can also be viewed by calling vignette(package = "ramlegacy")

Until the installation instruction are for the CRAN version. Alternatively, one could install by removing --no-build-vignettes from the build_opts in install_github

Fixed that line!

Vignette(s)** demonstrating major functionality that runs successfully locally

Fixed!

Function Documentation: for all exported functions in R help

Revised!

Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

  • Consider adding either suggested reference or the maintainer of ramlegacy to the DESCRIPTION file in Authors@R
  • Consider adding a CODE OF CONDUCT and CONTRIBUTING file directly to the repo.

Added!

  • I think that you are adding too much onto the library call. Consider any loading of data should be left solely to the load_ramlegacy function. I think this constrains future development of the package as future may entail functionality outside the database. This is also outside what a user would expect from a typical library call. We are into stylistic territory here but I personally think objects should not be loaded with the library call.

Agreed! We have removed the loading behavior of library(ramlegacy) and all loading is now solely done by load_ramlegacy

  • I wonder if load_ramlegacy should gain an argument where a user can specify a vector that controls which dataset are loaded. This might be a useful feature for production code, particularly a shiny app.

Excellent suggestion! We have added a dfs argument in load_ramlegacy to which the user can pass a vector of dataframes to load.

Having ram_url as an argument helps us write tests in which download_ramlegacy deal with internet connection issues. Although the package downloads and caches the database in the user's rappdirs directory by default we have now decided that having ram_path as an argument is useful in download_ramlegacy and load_ramlegacy because it allows the user to download the database to a location of their choice and read from that location if they wish to do so. I think we want to give the user that flexibility.

  • I would strongly consider a NEWS.md file to track changes between versions. rOpenSci provides a great template

Added!

  • I think the package lacks some connection with the actual data. Right now a bunch of dataframes are loaded but a new user is expected to have prior knowledge of the ram database. I think this package could provide some path for users to learn about the data itself.

We will definitely be open to including more educating vignettes if the maintainers of the database wanted to do that and certainly be willing to link more informative documentation in the vignette but at the same time we want to avoid writing things that are out-of-date or not coming from the maintainers. Also, in our opinion educating a new user in the database might be a bit outside the scope of the package which was primarily to improve access to the data.

  • It isn't clear to me why you need different versions of the database. This functionality is highlighted quite prominently but it's utility is not clear.

We believe providing access to older versions can be really useful for users trying to reproduce older research papers and studies.

  • RE: the alternate download location on github. I would consider if you have acknowledged and licensed this data appropriately in your assets repo. I would contact the RAM maintainers directly and seek direction on this. Fully reproducing the data might cause some concern with those maintainers.

Prof. Boettiger was in touch with the RAM maintainers regarding that. While the maintainers have moved the latest versions (4.40, 4.41, 4.44) to Zenodo they still have to do the same for the older versions. So till that happens the package will have to use the github repo to make the older versions available to the users.

  • The docs directory, _pkgdown.yml and _config.yml all needed to be added to .Rbuildignore for R CMD check to pass on my machine.

Added to .Rbuildignore!

  • I am wondering if this is intentional behaviour. I think this is a carryover from readxl::read_excel but given the class of the data object it prints either as a dataframe or as a tibble depending on whether tibble is loaded or not. One thing to consider is even just adding tibble to Suggests for nicer printing in a vignette

It was intentional. We wanted to give the user the option to choose whether they wanted the tables as tibbles or dataframe. If for some reason a user prefers data.frame to tibble then we certainly don't want to impose tibbles on them.

  • I get these errors when I run goodpractice::gp
  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF
    version. This typically indicates Rd problems.
  ✖ fix this R CMD check ERROR: Re-running with no redirection of
    stdout/stderr. Hmm ... looks like a package You may want to clean
    up by 'rm -Rf
    C:/Users/salbers/AppData/Local/Temp/Rtmp4yG4PY/Rd2pdf43d827bd526'

We couldn't reproduce these on our machines. I think they might be specific to your machine and may be occurring because you don't have Tex installed.

I think some slight improvement could be made by running styler on the package though it is marginal.

Agreed! I will run it after all the changes have been approved.

I wonder if it would be useful to engage Sean Anderson to see if he would be willing to submit a pull request with his code that converts the Access database into an sqlite database. I think there would be an appetite for the sqlite option which could exist alongside the Excel version. His process requires an additional utility outside of R but maybe a R-only solution is available.

We are definitely open to working with Sean Anderson. I think Prof. Boettiger has pinged him.

kshtzgupta1 commented 5 years ago

@jafflerbach Thank you for a detailed review! We have made many changes following your suggestions. I will update the README and vignette to reflect those changes after they are approved.

Vignette(s) demonstrating major functionality that runs successfully locally The Vignette is good except it does not explain how to install the package. You can only find installation instructions in the README

Fixed!

  • When running goodpractice::gp() I got the following output:
  ✖ write unit tests for all functions, and all package code in general. 69% of code lines are covered by test
    cases.

    R/download_ramlegacy.R:53:NA
    R/download_ramlegacy.R:58:NA
    R/download_ramlegacy.R:59:NA
    R/download_ramlegacy.R:68:NA
    R/download_ramlegacy.R:69:NA
    ... and 58 more lines

  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd
    problems.

We couldn't reproduce that warning on our machines. I think it might be specific to your machine and may be occurring because you don't have Tex installed.

That being said, I strongly encourage the developers of this package to work with Sean and the maintainers of the RAM legacy database to get this package functional.

I believe Prof. Boettiger has reached out to Sean regarding that.

I first tried download_ramlegacy() when the site was down (1/17/2019). I downloaded from the backup location on github. This worked well. A couple of hours later, the website was backup and when I tried to download the only version now available on the site (4.4) I got an error message

> download_ramlegacy(version = "4.4")
Error: Invalid version number. Available versions are 1.0, 2.0, 2.5, 3.0, 4.3,

This should be resolved now!

I can forsee some user issues with the ram_url() argument in download_ramlegacy():

download_ramlegacy(version = NULL, ram_path = NULL,
  ram_url = "https://depts.washington.edu/ramlegac/wordpress/databaseVersions")

First, it looks like there is a typo (ramlegac instead of ramlegacy) so if the download_ramlegacy() function isn't acting as the user would expect (whether it's their fault or not) they might see the URL in the Help document and manually try to fix the typo. This may be an actual typo, I don't know. Either way, if you don't want the user to ever touch it I would hardcode that ram_url into the function itself rather than having it as an argument.

It was actually ramlegac in the url. But that is no longer relevant since we are now passing in the zenodo doi url to ram_url. Having ram_url as an argument helps us test for intended behavior of download_ramlegacy in face of internet connection issues and network problems.

Also the path argument in load_ramlegacy() is again one that should not be touched by the user. I tried to mess with this path argument and when I entered an incorrect path I received this:

> load_ramlegacy(version = 4.3, path = "~/github")
★ Loading version 4.3 ...
Error in readRDS(path) : error reading from connection
In addition: Warning message:
In readRDS(path) : error reading the file

I suggest changing the function so that if something is entered in as a path argument, it returns a message reminding the user to not change the path argument. Or, remove this as an argument entirely.

Although the package downloads and caches the database in the user's rappdirs directory by default we have now decided that having ram_path as an argument is useful in download_ramlegacy and load_ramlegacy because it allows the user to download the database to a location of their choice and read from that location if they wish to do so. I think we want to give the user that flexibility.

boshek commented 5 years ago

Great job @kshtzgupta1! Thanks for such a great package and great job addressing issues raised in this review. You have sufficiently addressed any issues I raised. @geanders I recommend this package for acceptance. I do want to raise some final considerations:

It was intentional. We wanted to give the user the option to choose whether they wanted the tables as tibbles or dataframe. If for some reason a user prefers data.frame to tibble then we certainly don't want to impose tibbles on them.

I am really of two minds on this. I employ this same trick on a couple packages of mine but I also wonder if this introduces some weird ambiguity that relies on the side effect of having a package loaded. Though I don't think the package acceptance is contingent on this, I would in this case recommend imposing a tibble on the user.

Prof. Boettiger was in touch with the RAM maintainers regarding that. While the maintainers have moved the latest versions (4.40, 4.41, 4.44) to Zenodo they still have to do the same for the older versions. So till that happens the package will have to use the github repo to make the older versions available to the users.

I still don't love this solution. To me, this places too much of the package functionality at the mercy of that repo being available rather having a direct line to the data. I think that it is worth the sacrifice of having fewer versions available to drop this solution. That said, I similarly don't think this is sufficient to hold up acceptance.

cboettig commented 5 years ago

Hi Sam @boshek , thanks for the excellent review and the kinds words! I don't mean to jump in here but just wanted to add a little context to the decision about accessing older versions. I agree that it would be much more desirable to have all versions on Zenodo, and based on our discussions with them, I think the RAM Legacy team will eventually post those, but it is hard to know exactly when. Meanwhile, I do believe access to the old versions is critical, all the more so for them not being available anywhere else now. Among other reasons, there are dozens of high profile papers based on these older versions; just today there is yet another appearing in Science that uses the version 3 data.

boshek commented 5 years ago

Great context @cboettig . This rationale makes perfect sense.

geanders commented 5 years ago

@jafflerbach : I wanted to check in with you to see if you had any thoughts on the response to your initial review for this package?

cboettig commented 5 years ago

Also just a note that we did just hear back from Sean Anderson who was very gracious and positive about the package. He gave us a few suggestions (data tables no longer append the version name on the table, which was no longer necesary anyway with new approach of explicitly calling load_ramlegacy() with a version number instead of trying to load automatically on library(). (Also note that as Sean's package no longer works in it's current state, as it still points to the Wordpress URLs which are now defunct and would need to be replaced with the Zenodo DOIs)

Thanks Sam, Jamie & Brooke for your feedback so far! @kshtzgupta1 summarizes the edits in reply to Jamie above, these are waiting on an PR from an ropensci-review branch in the package meanwhile.

jamiecmontgomery commented 5 years ago

Thanks @geanders, @kshtzgupta1 and @cboettig. Good hear you've been in touch with Sean Anderson as well as the RAM maintainers. No further requests from me on this package now.

geanders commented 5 years ago

Approved! @kshtzgupta1 : Thanks so much for these thoughtful revisions. Both reviewers agree that the package should be accepted. Further, based on there reviews, there are a few potential changes you might want to consider in future revisions (e.g., removing the reliance on the GitHub repos of older database versions once this is possible, thinking some more about whether you want different behavior based on whether the user has tibble loaded) or through other avenues (e.g., posting some examples of using the downloaded data from the package, perhaps as a blog post, to help new users understand how this package can fit into a pipeline of analysis). Thanks to both @boshek and @jafflerbach for very helpful input and suggestions.

There are some things we'll need to do for the final processing of this package. I'm including the standard to-dos as a check list here. I'd like to add the caveat that this will be my first time doing the editor-side part of this process, so please bear with me as I figure out my end of things for the process!

To-dos:

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

geanders commented 5 years ago

Also, since you are planning to submit to CRAN, you may find this list of CRAN gotchas helpful, and I'd be happy to provide support through that process.

stefaniebutland commented 5 years ago

Hello @kshtzgupta1 I'm rOpenSci's Community Manager, here to say we would love to feature a post about ramlegacy on our blog. This might bring your work to a larger audience.

This link will give you many examples of blog posts by authors of peer-reviewed packages so you can get an idea of the style and length you prefer: https://ropensci.org/tags/software-peer-review/. Shorter tech notes are here: https://ropensci.org/technotes/

Here are some technical and editorial guidelines for contributing a blog post: https://github.com/ropensci/roweb2#contributing-a-blog-post.

Please let me know if you're interested and we can discuss a deadline. Happy to answer any questions.

kshtzgupta1 commented 5 years ago

My apologies for such a late reply @geanders. I was ill and needed to have a surgery. I have finished all the to-dos. Thank you for being such a helpful editor!

kshtzgupta1 commented 5 years ago

@boshek Thank you for being such a great reviewer! Is it fine if I add you as a "rev"-type contributor in the Authors@R field? If so, can I put in your email as sam.albers@gmail.com and your orcid id as https://orcid.org/0000-0002-9270-7884 ?

kshtzgupta1 commented 5 years ago

@jafflerbach Thank you very much for reviewing the package! I would like to acknowledge you as a "rev"-type contributor in the Authors@R field if that's okay with you. Can I put in your email as afflerbach@nceas.ucsb.edu and orcid id as https://orcid.org/0000-0002-5215-9342 ?

kshtzgupta1 commented 5 years ago

@stefaniebutland Thank you so much for reaching out. My sincere apologies for the late reply. I think Prof. Boettiger and I would definitely be interested in authoring a blog post about ramlegacy. I'll discuss the content of the post with Prof. Boettiger and let you know about a timeline very soon.

jamiecmontgomery commented 5 years ago

Yep both of those are correct.

Jamie Afflerbach Marine Data Scientist National Center for Ecological Analysis and Synthesis (NCEAS https://www.nceas.ucsb.edu/) University of California, Santa Barbara website http://jamieafflerbach.com - github https://github.com/jafflerbach- twitter https://twitter.com/jafflerbach

On Wed, Apr 3, 2019 at 1:39 PM Kshitiz Gupta notifications@github.com wrote:

@jafflerbach https://github.com/jafflerbach Thank you very much for reviewing the package! I would like to acknowledge you as a "rev"-type contributor in the Authors@R field if that's okay with you. Can I put in your email as afflerbach@nceas.ucsb.edu and orcid id as https://orcid.org/0000-0002-5215-9342 ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/software-review/issues/264#issuecomment-479649853, or mute the thread https://github.com/notifications/unsubscribe-auth/AGhojjYhjkc8-ulVxEPQIbg29n4-vmWDks5vdRF4gaJpZM4Ysexu .

boshek commented 5 years ago

@kshtzgupta1 👍

geanders commented 5 years ago

@kshtzgupta1 : No worries at all! Alright, I will work through the next steps on my end. Again, since this is the first package I've edited to reach this point, it might take me a bit longer than usual to make sure I've gotten through any required steps. Congrats on a very nice package!

stefaniebutland commented 5 years ago

@kshtzgupta1 Please look for an email from me to follow up about a blog post

geanders commented 5 years ago

@kshtzgupta1 : I think we are all set with our post-approval steps, so I'm going to close this issue. Congrats on a really nice package!