rrricanes - Githubissues

timtrice commented 7 years ago

Summary

What does this package do? (explain in 50 words or less): Scrapes and parses tropical cyclone advisories and GIS data from the National Hurricane Center. Can access real-time or archived data for storms since 1998.
Paste the full DESCRIPTION file inside a code block below:

Package: rrricanes
Type: Package
Title: Web scraper for real-time and archived advisory products for Atlantic and 
    east Pacific hurricanes and tropical storms
Version: 0.1.0.1
Authors@R: person("Tim", "Trice", email = "tim.trice@gmail.com",
                  role = c("aut", "cre"))
Depends:
  R (>= 3.3.3)
Description: Get archived data of past and current hurricanes and tropical 
  storms for the Atlantic and eastern Pacific oceans. Data is available for 
  storms since 1998.
URL: http://www.timtrice.net
BugReports: https://github.com/timtrice/rrricanes/issues
License: MIT + file LICENSE
LazyData: TRUE
Imports:
  data.table (>= 1.9.6), 
  dplyr (>= 0.5.0), 
  httr (>= 1.2.1), 
  lubridate (>= 1.6.0), 
  magrittr (>= 1.5), 
  purrr (>= 0.2.2.2), 
  readr (>= 1.1.0), 
  rvest (>= 0.3.2), 
  stringr (>= 1.2.0), 
  tibble (>= 1.3.1), 
  tidyr (>= 0.6.2), 
  xml2 (>= 1.1.1)
Suggests:
  devtools, 
  knitr, 
  rmarkdown,
  testthat,
RoxygenNote: 6.0.1
VignetteBuilder: knitr

URL for the package (the development repository, not a stylized html page): https://github.com/timtrice/rrricanes
Please indicate which category or categories from our package fit policies this package falls under and why? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.): data extraction, data visualization, data munging, data packages
Who is the target audience?
Those with an interest in studying tropical meteorology, tracking storm developments and comparing the forecast abilities of the National Hurricane Center storm-to-storm, year-to-year.
Are there other R packages that accomplish the same thing? If so, what is different about yours?
I have searched (I believe) extensively and have not found one at all. To my knowledge the only other hurricane package in CRAN or GitHub is HURDAT - and I wrote that one, too :)

Requirements

R 3.3.3 Most tidyverse packages. rnaturalearthdata and rnaturalerathhires will be required for release 0.1.1.

Confirm each of the following by checking the box. This package:

[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] does not violate the Terms of Service of any service it interacts with.
[x] includes documentation with examples for all functions.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage, using services such as Travis CI, Coeveralls and/or CodeCov.
[x] I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you wish to automatically submit to the Journal of Open Source Software? If so:
- [ ] The package contains a paper.md with a high-level description in the package root or in inst/.
- [ ] The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)

Detail

[x] Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings: In release 0.1.0.1, only 2 notes about global variables. This has largely been resolved in unreleased 0.1.1.
[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions: I used a CONTRIBUTORS.md file instead of CODE_OF_CONDUCT.md. I use @keywords internal for internal functions
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

maelle commented 7 years ago

Thanks for your submission @timtrice ! :smile_cat:

I have two questions

I see you have Appveyor CI set up but in the logs I see it happens with these R CHECK options --no-tests --no-manual --no-build-vignettes, could you try a build on Appveyor not skipping tests and building the vignettes? I get errors when doing it on my Windows PC (R 3.3.1 though, I'll try again on another PC later but maybe the R version is not the issue, hence the Appveyor suggestion :wink:)

Tests results

> devtools::test()
Loading rrricanes
Loading required package: testthat

Attaching package: ‘testthat’

The following object is masked from ‘package:dplyr’:

    matches

Testing rrricanes
Test base functions.: .........................
Storm Discussions (discus): ...........................12
|====================================================================================|100% ~0 s remaining     ....
Forecast/Advisory Products (fstadv): 3
Test getting storm data.: .
|====================================================================================|100% ~0 s remaining     ..
Position Estimates (posest): .................
|====================================================================================|100% ~0 s remaining     4
|====================================================================================|100% ~0 s remaining     .....5
Scrapers: .................
Update (update): .................
|====================================================================================|100% ~0 s remaining     .

Failed -------------------------------------------------------------------------------------------------------
1. Error: Test discus() (@test_discus.R#54) ------------------------------------------------------------------
character string is not in a standard unambiguous format
1: discus(link = url) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/tests/testthat/test_discus.R:54
2: scrape_header(contents, ret = "date") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/discus.R:71
3: scrape_date(header) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/scrapers.R:234
4: as.POSIXct(x, tz = "Etc/GMT+4") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/scrapers.R:162
5: as.POSIXct.default(x, tz = "Etc/GMT+4")
6: as.POSIXct(as.POSIXlt(x, tz, ...), tz, ...)
7: as.POSIXlt(x, tz, ...)
8: as.POSIXlt.character(x, tz, ...)
9: stop("character string is not in a standard unambiguous format")

2. Error: Test get_discus() (@test_discus.R#71) --------------------------------------------------------------
character string is not in a standard unambiguous format
1: get_discus(link = url) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/tests/testthat/test_discus.R:71
2: purrr::map(filter_discus(products), discus) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/discus.R:41
3: .f(.x[[i]], ...)
4: scrape_header(contents, ret = "date") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/discus.R:71
5: scrape_date(header) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/scrapers.R:234
6: as.POSIXct(x, tz = "Etc/GMT+4") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/scrapers.R:162
7: as.POSIXct.default(x, tz = "Etc/GMT+4")
8: as.POSIXct(as.POSIXlt(x, tz, ...), tz, ...)
9: as.POSIXlt(x, tz, ...)
10: as.POSIXlt.character(x, tz, ...)
11: stop("character string is not in a standard unambiguous format")

3. Error: (unknown) (@test_fstadv.R#8) -----------------------------------------------------------------------
character string is not in a standard unambiguous format
1: get_fstadv(link = url.al011998) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/tests/testthat/test_fstadv.R:8
2: purrr::map(filter_fstadv(products), fstadv) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/fstadv.R:157
3: .f(.x[[i]], ...)
4: scrape_header(contents, ret = "date") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/fstadv.R:210
5: scrape_date(header) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/scrapers.R:234
6: as.POSIXct(x, tz = "UTC") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/scrapers.R:152
7: as.POSIXct.default(x, tz = "UTC")
8: as.POSIXct(as.POSIXlt(x, tz, ...), tz, ...)
9: as.POSIXlt(x, tz, ...)
10: as.POSIXlt.character(x, tz, ...)
11: stop("character string is not in a standard unambiguous format")

4. Error: (unknown) (@test_prblty.R#9) -----------------------------------------------------------------------
character string is not in a standard unambiguous format
1: get_prblty(al1998[1]) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/tests/testthat/test_prblty.R:9
2: purrr::map(filter_prblty(products), prblty) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/prblty.R:25
3: .f(.x[[i]], ...)
4: scrape_header(contents, ret = "date") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/prblty.R:52
5: scrape_date(header) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/scrapers.R:234
6: as.POSIXct(x, tz = "Etc/GMT+4") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/scrapers.R:156
7: as.POSIXct.default(x, tz = "Etc/GMT+4")
8: as.POSIXct(as.POSIXlt(x, tz, ...), tz, ...)
9: as.POSIXlt(x, tz, ...)
10: as.POSIXlt.character(x, tz, ...)
11: stop("character string is not in a standard unambiguous format")

5. Error: 1998, Tropical Storm Alex, Advisory 26 (@test_public.R#38) -----------------------------------------
character string is not in a standard unambiguous format
1: expect_identical(public(al.1998.alex.products.public[25]) %>% dplyr::select(Status) %>% dplyr::first(), "Tropical Disturbance") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/tests/testthat/test_public.R:38
2: identical(object, expected)
3: public(al.1998.alex.products.public[25]) %>% dplyr::select(Status) %>% dplyr::first()
4: eval(lhs, parent, parent)
5: eval(expr, envir, enclos)
6: public(al.1998.alex.products.public[25])
7: scrape_header(contents, ret = "date") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/public.R:70
8: scrape_date(header) at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/scrapers.R:234
9: as.POSIXct(x, tz = "Etc/GMT+4") at C:\Users\msalmon.ISGLOBAL\Documents\rrricanes/R/scrapers.R:156
10: as.POSIXct.default(x, tz = "Etc/GMT+4")
11: as.POSIXct(as.POSIXlt(x, tz, ...), tz, ...)
12: as.POSIXlt(x, tz, ...)
13: as.POSIXlt.character(x, tz, ...)
14: stop("character string is not in a standard unambiguous format")

DONE =========================================================================================================
Don't worry, you'll get it.

In the following (uncommented) code, why do you use assignments to the global environment?

    s <- sapply(x, function(z, n = names, l = link, m = msg){
        f <- paste("get", z, sep = "_")
        res <- do.call(f, args = list("link" = l, "msg" = m))
        if (!is.null(n[[z]])) {
            assign(n[z][[1]], res, envir = .GlobalEnv)
        } else {
            assign(z, res, envir = .GlobalEnv)
        }
    })

timtrice commented 7 years ago

@maelle , thank you for taking the time to review my project!

Regarding the code in question, it has been replaced with the following:

    ds <- purrr::map(products, .f = function(x) {
        sprintf("get_%s", x) %>%
            purrr::invoke_map(.x = list(link = link)) %>%
            purrr::flatten_df()})
    names(ds) <- products
    return(ds)

This is in branch develop-0.2.0. I believe I have modified all of the assign calls; they do not show up in R CMD checks (0 notes).

Regarding the tests; I apologize for them not being enabled. At one point earlier this week I had to disable them due to the NHC archives being down for some annual archive pages. This seems to have been resolved now. Tests have been added to both Travis and Appveyor.

A side note to that last point; I have noticed there are timeouts when accessing the NHC archives. This has been verified on numerous occasions with Travis and Appveyor during tests. I have a feature request pending to add handling for this. Additionally, in 0.2.0, load_storm_data has been added that accesses some already-processed datasets through the GitHub repo rrricanesdata. So I'm trying to develop some workaround to ensure users have multiple ways to get quick access to data they need.

maelle commented 7 years ago

@timtrice thanks, nice work! For review (and my editor checks :-) ) will master be updated?

Great on having already processed datasets, but I see that repo has no licensing information. I'm not experienced at all in licensing issues but I guess it'd be at least nice to add where this data comes from etc.

maelle commented 7 years ago

Now that I could check the package (on the right R version) here is something more formalized.

Editor checks:

[ x] Fit: The package meets criteria for fit and overlap
[ x] Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
[ x] License: The package has a CRAN or OSI accepted license
[ x] Repository: The repository link resolves correctly
[ ] Archive (JOSS only, may be post-review): The repository DOI resolves correctly
[ ] Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Note that lazyeval should soon be replaced by rlang so it might be worth to have a look at https://github.com/tidyverse/rlang
2 long lines identified by goodpractice::gp()

── GP rrricanes ───────────────────────────────────────────────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 93% of code lines are
    covered by test cases.

    R/base.R:74:NA
    R/base.R:75:NA
    R/base.R:76:NA
    R/base.R:77:NA
    R/base.R:78:NA
    ... and 64 more lines

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that
    are about 80 characters wide. Try make your lines shorter than 80 characters

    R\base.R:84:1
    R\scrapers.R:83:1
    tests\testthat\test_scrapers.R:46:1

Typos identified via devtools::spell_check()


currentlby      get_fstadv.Rd:18
dataframej      build_archive_df.Rd:20

Reviewers: @jsta @robinsones Due date: 2017-07-10

timtrice commented 7 years ago

@maelle, thank you for the compliment! Very much appreciated.

The master will be updated with develop-0.2.0 when I finish the GIS products. They probably could be released now but I'm trying to think over things and make sure there's not a better way to handle them (consolidating functions, for example). Plus one of the products is raster format so I'm looking over ways to view them with ggplot, if even possible, what the alternatives are, etc.

In hindsight I should have incorporated bug fixes into master. It was one of those things that I had already made new changes and started just hammering out bugs - before I knew it I lost track of where I was. This is the first project I'm trying to incorporate R package rules and github best practices along with what I know needs to be done. Some of it (as your example above) is just a sloppy "get it working" attempt before I get back to making the code cleaner.

Regarding rrricanesdata, I can certainly add a license and the script to add new datasets and update existing ones as need be. I'll add a README as well today.

maelle commented 7 years ago

@timtrice sorry to have posted so many comments in a row. Should we put the review on hold until you've merged everything with master? This way you'd get a review on the final product. :-)

timtrice commented 7 years ago

@maelle ,

No problem regarding the comments.

Let me push the fixes to 0.1.2; I'll hold back the GIS products until I can determine the best direction. I'll post an update later today or tomorrow morning when it's complete.

The tests do take forever. This is because of the scraping; particularly the forecast/advisory products. There are minor variations in some of the advisories so I have to make sure that one regex pattern change here doesn't affect other advisories. That is another issue I'm also trying to determine better methods.

Edit: to be clear those notes should no longer exist in this new release.

Thank you

maelle commented 7 years ago

Ok so I'll re-do the editor checks once you've updated this thread! :-)

maelle commented 7 years ago

@timtrice I added the results of goodpractice::gp() in the comment with other editor checks. Please have a look before the next version, goodpractice gives some good suggestions (and some you might prefer to ignore which is fine).

timtrice commented 7 years ago

I have released 0.1.2 into the master and develop branch.

Local checks and tests pass. I'm having issues passing them on Appveyor. Travis has passed. Appveyor wants to fail 2 or 3 tests in because of timeout issues. I hoped to have resolved this by adding functionality to reattempt data collection on timeout but it apparently isn't enough. At least, for testing. I'm not sure what else I can do to resolve those type of issues. I'll keep re-building the commit if it continues to fail but hopefully it'll catch a good run and finally go through.

(Edit to note all tests in Appveyor finally passed)

Thank you again for taking the time to review my project and I look forward to hearing back from the community about it.

Tim

maelle commented 7 years ago

@timtrice awesome, and could you identify why they previously failed?

maelle commented 7 years ago

@timtrice Best loading message ever! 😁 I updated the editor checks.

I'm now looking for reviewers.

timtrice commented 7 years ago

@maelle ,

In these instances (and very common), when the tests are checked or vignettes are built the functions are scraping the National Hurricane Center archives. Very often a timeout is reached though I have gotten other odd errors I haven't researched yet.

In Release 0.1.2, I wrapped the initial extraction function in a check:

get_url_contents <- function(link) {
    # Try to establish connection three times with timeout of 3 seconds
    max_attempts <- getOption("rrricanes.http_attempts")
    if (max_attempts > 5) max_attempts <- 5
    for (i in seq(1, max_attempts)) {
        safe_GET <- purrr::safely(httr::GET)
        contents <- safe_GET(url = link,
                             httr::timeout(getOption("rrricanes.http_timeout")))
        if (!is.null(contents$result))
            return(xml2::read_html(x = contents$result))
    }
    stop(contents$error$message, call. = TRUE)
}

So, there are two options user can set; rrricanes.http_attempts which defaults to 3 and rrricanes.http_timeout which defaults to 1. Attempts cannot be more than 5 (arbitrary number). The tests are run with these options set to 5 and 1, respectively. Now I"m thinking I should have added a pause in there, as well...

The constant issues with the archives is why I built load_storm_data which I would consider now the primary option. That functions ignores the NHC archives and goes straight to rrricanesdata. This puts more responsibility on me to make sure those archives are up-to-date and I'm looking at methods to automate this. However, I also emphasize within the project this function may not return the most up-to-date data and to use the other functions as a fallback.

I'm open to any alternatives or ideas you and others may have. It is very annoying and frustrating, especially with Appveyor that may get through the first four tests successfully but fail on the last; in which I have to rebuild the entire thing.

I am also looking at implementing data.world in some way to make it easier. As the plans are to add GIS data, reconnaissance data and others, I know I need to develop a good solid data plan going into future releases.

Thank you,

Tim Trice

maelle commented 7 years ago

@timtrice In this code I make the waiting time a bit longer with each attempt. I should have used purrr::safely as well. 😉

Have you told the owner of the website you're scraping about your package by the way? It'd also be good to know how ok it is for you to have a copy of their data.

timtrice commented 7 years ago

Thanks @maelle, I'll take a look at it. I'm trying a simple Sys.sleep right now.

Here is the latest example of a fail in Travis: https://travis-ci.org/timtrice/rrricanes/jobs/241505560

Quitting from lines 175-179 (getting_started.Rmd) 
Error: processing vignette 'getting_started.Rmd' failed with diagnostics:
Timeout was reached
Execution halted
The command "R CMD build  ." failed and exited with 1 during .
Your build has been stopped.

Just constant...

The NHC webmaster is aware, yes. Addtionally, the data is considered public domain. The National Weather Service states:

Use of NOAA/NWS Data and Products

The information on National Weather Service (NWS) Web pages are in the public domain, 
unless specifically noted otherwise, and may be used without charge for any lawful purpose 
so long as you do not: 1) claim it is your own (e.g., by claiming copyright for NWS 
information -- see below), 2) use it in a manner that implies an endorsement or affiliation 
with NOAA/NWS, or 3) modify its content and then present it as official government material. 
You also cannot present information of your own in a way that makes it appear to be official 
government information.

I've tried to document thoroughly where the data originates and, to the best of my knowledge, have made no implicit or explicit claims the data is mine or that it is restricted.

maelle commented 7 years ago

Reg. the licensing stuff I'd suggest we wait for the reviewers (that I'm currently looking for) whether they have suggestions, and if needed we can ask the community at large what's best practice. :-)

timtrice commented 7 years ago

Apologies if I keep moving the goal posts, so-to-speak, but I found errors in some of the tidy functions and had to rush a patch. Release 0.1.3 has been pushed to master.

https://github.com/timtrice/rrricanes/releases/tag/v0.1.3

maelle commented 7 years ago

One reviewer found, seeking another one. @timtrice

timtrice commented 7 years ago

Can someone advise how I should handle new releases? I expect to be able to push a beta release of 0.2.0 today into master but I want to make sure this will not cause any issues with the reviewing stage.

maelle commented 7 years ago

The reviews have not started so you can do it 😉

timtrice commented 7 years ago

Release 0.2.0-1 is now in the master branch. There are no major changes scheduled anytime soon.

maelle commented 7 years ago

Still searching a second reviewer, thanks for your patience @timtrice !

maelle commented 7 years ago

Many thanks @jsta @robinsones for accepting to review this package! :smile_cat: Your reviews are due on 2017-06-19.

As a reminder here is a link to the reviewing guide and here is one to the review template.

jsta commented 7 years ago

@maelle 2017-06-19 is yesterday....

maelle commented 7 years ago

Oops, thanks @jsta! :see_no_evil: 2017-07-10 sounds better right? :smile_cat:

maelle commented 7 years ago

I can even see it was right in the comment where I assigned you (after the editor checks) so I was only 50% stupid. :relieved:

jsta commented 7 years ago

@timtrice I am having trouble building after packrat seems to have activated itself. I wonder if the issue that is that packrat is tracking a lot of packages. Consider the output of packrat::status():

> The following packages are tracked by packrat, but are no longer available in the local library nor present in your code: _ HURDAT 0.1.0 PKI 0.1-3 RgoogleMaps 1.4.1 assertr 2.0.2.2 cellranger 1.1.0 debugme 1.0.2 forcats 0.2.0 gganimate 0.1.0.9000 ggmap 2.6.1 gh 1.0.0 haven 1.0.0 modelr 0.1.0 pkgdown 0.1.0.9000 processx 2.0.0 proto 1.0.0 rematch 1.0.1 rsconnect 0.8 servr 0.6 usethis 0.0.0.9000 viridisLite 0.2.0 > You can call packrat::snapshot() to remove these packages from the lockfile, or if you intend to use these packages, use packrat::restore() to restore them to your private library. > The following packages are used in your code, tracked by packrat, but no longer present in your library: from to RCurl 1.95-4.8 NA bindrcpp 0.1 NA broom 0.4.2 NA caTools 1.17.1 NA desc 1.1.0 NA devtools 1.13.1 NA dplyr 0.7.0 NA dwapi 0.1.1 NA evaluate 0.10 NA geosphere 1.5-5 NA ggplot2 2.2.1 NA htmltools 0.3.6 NA httpuv 1.3.3 NA httr 1.2.1 NA knitr 1.16 NA latticeExtra 0.6-28 NA lubridate 1.6.0 NA magick 0.4 NA mapproj 1.2-5 NA maptools 0.9-2 NA markdown 0.8 NA memoise 1.1.0 NA munsell 0.4.3 NA pander 0.6.0 NA pkgbuild 0.0.0.9000 NA pkgload 0.0.0.9000 NA plyr 1.8.4 NA png 0.1-7 NA praise 1.0.0 NA psych 1.7.5 NA purrr 0.2.2.2 NA raster 2.5-8 NA rasterVis 0.41 NA readr 1.1.1 NA readxl 1.0.0 NA reshape2 1.4.2 NA rgdal 1.2-7 NA rgeos 0.3-23 NA rjson 0.2.15 NA rlang 0.1.1.9000 NA rmarkdown 1.5 NA rnaturalearthdata 0.1.0 NA roxygen2 6.0.1 NA rprojroot 1.2 NA rstudioapi 0.6 NA rvest 0.3.2 NA scales 0.4.1 NA selectr 0.3-1 NA sp 1.2-4 NA stringi 1.1.5 NA stringr 1.2.0 NA testthat 1.0.2 NA tibble 1.3.3 NA tidyr 0.6.3 NA whisker 0.3-2 NA withr 1.0.2 NA xml2 1.1.1 NA yaml 2.1.14 NA zoo 1.8-0 NA > Use packrat::restore() to restore these libraries.

Maybe you could consider cleaning these up to a minimal set?

timtrice commented 7 years ago

@jsta, thank you. I'm about getting tired of packrat myself but never noticed outsides with installation. Only use it for testing (such as the new dplyr).

Does it really even need to be in the repo? I have no problems removing it from remote and just keeping it on my setup.

jsta commented 7 years ago

Seems to be most useful for archiving rarely updated packages rather than tracking a lot of dependencies in flux but I'm not very familiar with it myself so I can't say how much it is worth the hassle.

timtrice commented 7 years ago

You have the .Rbuildignore file as well, correct? It shouldn't even pay attention to packrat if I understood correctly.

jsta commented 7 years ago

I do have .Rbuildignore. I ended up running packrat::disable() and now everything is OK.

I do think packrat is a little out of it's element in this case and you might consider removing it.

jsta commented 7 years ago

Package Review

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
- I am having timeout issues with building the getting_started vignette. I wonder if there is a particular year with very few hurricanes that would solve the timeout problem.
- I cannot build the data.world vignette. Probably because I don't have an API key set up. You may want to consider setting the code chunks to eval=FALSE.
- You might want to consider numbering your vignettes (https://twitter.com/fmic_/status/875385557819555841) so that the data.world vignette is listed last.
[x] Function Documentation: for all exported functions in R help
[x] Examples for all exported functions in R Help that run successfully locally
- There are no examples for knots_to_mph, mb_to_in, status_abbr_to_str, get_discus, get_fstadv, tidy_fstadv, tidy_wr, tidy_fcst. Maybe some can be changed to non-exported?
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION
- Missing a maintainer field in the DESCRIPTION
- What does the Additional_repositories field signify?

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
- I really like the tidy_ functions. I wonder if it would make it easier on the end-user to have the get_ functions return tidy results by default with an optional argument to return "messy" results.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- Wow, this package has a lot of tests! Could you speed up package checking by consolidating some of these? I notice that some of the lines are re-downloading data from earlier lines.
- Also, you might consider using the skip_on_cran function for lines that call an external download as recommended by the rOpenSci packaging guide.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3

Review Comments

This package is VERY comprehensive! I am impressed with the breadth of information accessible from this package.

I get the following note from devtools::check() :

checking R code for possible problems ... NOTE

fstadv_forecasts: no visible global function definition for

  ‘packageVersion’

prblty: no visible global function definition for ‘packageVersion’

wndprb: no visible global function definition for ‘packageVersion’

Undefined global functions or variables:

  packageVersion

Consider adding

  importFrom("utils", "packageVersion")

to your NAMESPACE file.

timtrice commented 7 years ago

@jsta thank you very much for taking the time to review it!

Vignettes: a. Yes, timeouts have been a major headache that I have to find a way to resolve. This is an issue with the NHC website and occurs on Travis and Appveyor. I added options to control the timeout, number of attempts and a delay between attempts; they just do not seem to have made a difference. But I am looking at alternatives. b. The data.world article is not supposed to be a vignette but rather an article through pkgdown. I will revisit this and modify it accordingly.
Examples: I will add them. get_discus and get_fstadv along with the tidy functions will be \dontrun due to the timeout issues.
Community Guidelines a. Will update DESCRIPTION to list the maintainer b. Additional repositories slipped through; originally it was included to add the rnaturalearthhires package. However, that was removed as a dependent as I seemed to struggle with getting it through Travis and Appveyor. It should be documented that to use 10nm resolution on the tracking charts that it requires installation of rnaturalearthhires which is also mentioned in the README.
Functionality: I can certainly add a parameter for tidy or messy. If tidy then get_fstadv will have to return a list of dataframes which makes the expected output different from that of the other get functions; I didn't personally think that was a good option. If it's acceptable, I'm happy to add it.
Automated tests. a. Yes, tons of tests. And these bite me in more ways than you can image. Perfect example: the recent development of Bret and Cindy introduced a new header: "Potential Tropical Cyclone". This was a new type of advisory added this year (and the current release does not accurately capture the name and status of the cyclones for these advisories). In this instance, it was easy to modify this. However, these advisory packages are anything but consistent over the last 20 years; anything form formatting changes to typos. I do intend or rewriting them; just unsure the best way to do that at this point. b. I will review the skip_on_cran functionality and add it.
Will add packageVersion to imports.

timtrice commented 7 years ago

Vignettes have been modified to use /data
data_world pkgdown article moved to vignettes/articles
Examples added to previously mentioned functions
Add Maintainer to DESCRIPTION
Removed Additional_repositories field in DESCRIPTION
Slimmed down tests some and added skip_on_cran to all scraping calls.

I have not added the parameter to tidy the return of get_fstadv as discussed previously; would prefer to keep it consistent with the others. Is this necessary?

All local checks and tests pass. Travis and Appveyor passed on branch tests; currently running master and release. Any failures will be due to timeout; I'll restart in the morning if that occurs.

maelle commented 7 years ago

Thanks a lot @jsta for your review! :smile: and nice work @timtrice !

Reg. the maintainer field you don't need to have it in DESCRIPTION, e.g. in ropenaq this is how my DESCRIPTION looks like but when I build it the maintainer field is automatically added so on CRAN it's like this.

jsta commented 7 years ago

OK, I will wait for the review by @robinsones before I look at the changes.

maelle commented 7 years ago

@robinsones friendly reminder that your review is due on the 10th (monday) 😉

robinsones commented 7 years ago

Yes, thank you! In progress

robinsones commented 7 years ago

Hi @timtrice, your review is currently in progress! I'm running into one issue though. I see you've updated the README to have build_vignettes = TRUE, which helps one problem I was having yesterday, where the vignettes weren't available. However, now when I update the install with that, any function I try to get the help page for (e.g. ?tidy_wr) results in Error in fetch(key): lazy-load database./Library/Frameworks/R.framework/Versions/3.4/Resources/library/rrricanes/help/rrricanes.rdb' is corrupt. This was not happening yesterday, and I also replicated that it didn't happen on another computer when I did install_github("timtrice/rrricanes"). It only happened when I added thebuild_vignettes = TRUE` and reinstalled.

timtrice commented 7 years ago

Hi Emily, thank you for taking a look at it!

When you say you updated the package, did you remove the package then reinstall? I just tested on Debian and Win, R 3.3.3 and 3.4.0 and could not replicate the error. All vignettes appear correctly and I'm unable to find any issues with the help documentation.

If you can post your sessionInfo that may help me out.

Thank you!

Tim

Edit: I meant to include that I pushed a minor release yesterday; I have no idea if that would have anything to do with it but possibly, I suppose.

robinsones commented 7 years ago

Hi Tim,

What I did when I say update was devtools::install_github("timtrice/rrricanes", build_vignettes = TRUE, force = TRUE). I also tried removing the package with remove.packages and running devtools::install_github("timtrice/rrricanes"), but I continued to get the same error. I can try checking on a colleague's computer tomorrow they also have the same issue.

Here is my session info below:

R version 3.4.0 (2017-04-21) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: OS X El Capitan 10.11.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rrricanes_0.2.0-5 bindrcpp_0.2 ggplot2_2.2.1 dplyr_0.7.1

And from devtools::sesssion_info():

Session info ------------------------------------------------------------------------------------------------- setting value
version R version 3.4.0 (2017-04-21) system x86_64, darwin15.6.0
ui RStudio (1.0.143)
language (EN)
collate en_US.UTF-8
tz America/New_York
date 2017-07-09

rrricanes * 0.2.0-5 2017-07-10 Github (timtrice/rrricanes@e99a73e)

timtrice commented 7 years ago

This may be related to a previous issue with devtools (since resolved in 1.10.0);

I'll look into it a bit more and see if there's been anything more recent.

jsta commented 7 years ago

@robinsones I'm not sure if this is your situation, but I recall seeing these errors when I install (and load) a package and then try to build and reload that same package "in place" (without restarting RStudio). In this case, restarting RStudio before (re)installing fixes the issue.

I did my review by git cloning the repo and installing locally from source. I'm glad you are checking install_github

robinsones commented 7 years ago

HI @jsta thank you! That seems to be exactly it. Sorry for the alarm @timtrice. When I restart R everything works fine. It also works fine if I remove rrricanes, restart R, then reinstall rrricanes, so no one else should have any problems.

timtrice commented 7 years ago

Glad to hear it! I was just trying again on my Win machine doing what @jsta was mentioning and now my help won't load (but no error message). Which is funny because it's done this to me before while developing but I figured it was Rstudio or packrat. Now we learned something new!

robinsones commented 7 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience in README
- I think could you include more information in the README, especially about the source of the data. While I know from the vignette and this review process the data comes from the National Hurricane Center's (NHC) archives, it would be helpful to have that upfront. A link to the archive would also be welcome. This could be paired with a short sentence about how this package provides information previously unavailable to people unfamiliar with web-scraping and also functions that make understanding the data easier.
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions in R help
[x] Examples for all exported functions in R Help that run successfully locally
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4 hours.

Review Comments

Really great work! I am impressed how comprehensive it is.

timtrice commented 7 years ago

Thank you, Emily! I've added some additional info to the README (a bit more than you were asking, probably). I haven't pushed it to master yet; if it fits what you were thinking of let me know and I'll sync it up with master.

https://github.com/timtrice/rrricanes/tree/hotfix/v0.2.0-5.1

jsta commented 7 years ago

Ok, the changes look good :+1: . I ran devtools::check one last time and saw:

a NOTE about not having the sp package listed in the DESCRIPTION file under Suggests.
The test titled "_Test getprblty()" is timing out for me during download. I see that you have it skipped on CRAN so maybe this is OK.

timtrice commented 7 years ago

Joseph,

sp is added to Suggests (apologies for the oversight).

And all tests are passing in Travis and Appveyor so I would assume what you saw was one of the known hiccups from the NHC.

maelle commented 7 years ago

Thanks a lot for your review @robinsones ! 😁

Can I assume you're both happy with the changes @robinsones @jsta ?

@timtrice do you still have the repo with data? might be worth to take a look at other approaches like the idea of having a data package on drat (cf our discussion on Slack), and to check license?

ropensci / software-review

rrricanes #118

Summary

Requirements

Publication options

Detail

Editor checks:

Editor comments

Package Review

Documentation

Functionality

Final approval (post-review)

Review Comments

Package Review

Documentation

Functionality

Final approval (post-review)

Review Comments