ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

GSODR: Global Surface Summary Daily Weather Data in R #79

Closed adamhsparks closed 7 years ago

adamhsparks commented 7 years ago

Summary

Package: GSODR
Type: Package
Title: Global Summary Daily Weather Data in R
Version: 0.3
Date: 2016-10-16
Authors@R: c(person("Adam", "Sparks", role = c("aut", "cre"),
    email = "adamhsparks@gmail.com"),
    person("Tomislav", "Hengl", role = "aut",
    email = "tom.hengl@isric.org"),
    person("Andrew", "Nelson", role = "aut",
    email = "dr.andy.nelson@gmail.com"))
Maintainer: Adam Sparks <adamhsparks@gmail.com>
URL: https://github.com/adamhsparks/GSODR
BugReports: https://github.com/adamhsparks/GSODR/issues
Description:  The GSODR package offers automated downloading, parsing, cleaning
    and converting of Global Surface Summary of the Day (GSOD) weather data from
    the from the USA National Climatic Data Center (NCDC) into Comma Separated
    Values (CSV) or Geopackage (GPKG) files. The get_GSOD() function retrieves
    data from the NCDC's public FTP site and reformats it from United States
    Customary System (USCS) units to International System of Units (SI).
    Stations are individually checked for number of missing days as defined by
    the user to help ensure data quality. Stations files with too many missing
    observations are omitted from the final file. Only stations with valid
    reported latitude and longitude values are permitted in the final derived
    data set. A list of values for 200 metre buffered elevation data, derived
    from the CGIAR-CSI SRTM hole-filled 90 metre data set (Jarvis et al. 2008)
    are included for stations between -60 and 60 degrees latitude with valid
    latitude and longitude values. File output is returned as a comma-separated
    values (CSV) or GeoPackage (GPKG) file written to a local disk in a location
    selected by the user, which summarises each year by station. Additional
    useful variables, saturation vapour pressure (es), actual vapour pressure (ea)
    and relative humidity are calculated from the original data and included in
    the final data set. The resulting files can be as larger than 500mb
    depending on the user's stringency for missing data and geographic area of
    interest. Be sure to have sufficient RAM and disk space as well as a
    reasonably fast Internet connection to use this package to perform this
    operation for a global data set over many years. However, for much smaller
    and more manageable data sets, an individual country of interest may be
    selected as well as only stations falling between -60 and 60 degrees
    latitude for agroclimatology work or individual stations if the station ID
    is known. The resulting files include station data (e.g., station name,
    country, latitude, longitude, elevation) for use in a geographic
    information system (GIS). The function was largely based on T. Hengl's
    'getGSOD.R' script, available from
    <http://spatial-analyst.net/book/system/files/getGSOD.R> with enhancements
    to be cross-platform, more efficient with computing resources and more
    flexible. For information on the original data from NCDC, please see the
    GSOD readme.txt file available from,
    <http://www1.ncdc.noaa.gov/pub/data/gsod/readme.txt>.
Depends:
    R (>= 3.2.0)
License: GPL (>= 3)
Imports:
    data.table,
    doParallel,
    foreach,
    iterators,
    parallel,
    RCurl,
    readr,
    rgdal,
    sp,
    stats,
    utils
Suggests:
    chillR,
    dplyr,
    plotKML,
    spacetime,
    ggalt,
    ggplot2,
    knitr,
    lubridate,
    rgeos,
    rmarkdown,
    testthat,
    tibble,
    tidyr
RoxygenNote: 5.0.1
Encoding: UTF-8
NeedsCompilation: no
Repository: CRAN
LazyData: TRUE
ByteCompile: TRUE
VignetteBuilder: knitr

Confirm each of the following by checking the box. This package:

adamhsparks commented 7 years ago

@sckott, I've found a bug that I need to fix.

Not sure how it's crept in. I will report back when it's corrected.

adamhsparks commented 7 years ago

Well that was embarrassing. It's all fixed now, @sckott.

Glad I checked the package build again this morning.

sckott commented 7 years ago

Editor checks:


Editor comments

Thanks for your submission @adamhsparks - no worries about eth bug

Will be seeking reviewers, but in the meantime: I ran goodpractice::gp() on your package. Here are the results, which it may be helpful to get started with before review (you can run it yourself and dig into the results more):

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. At least some lines are not covered in this package.

    R/get_GSOD.R:230:NA
    R/get_GSOD.R:231:NA
    R/get_GSOD.R:234:NA
    R/get_GSOD.R:235:NA
    R/get_GSOD.R:238:NA
    ... and 359 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it
    gets invalid quite often. A build date will be added to the package
    when you perform `R CMD build` on it.
  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/get_GSOD.R:283:1
    R/get_GSOD.R:284:1
    R/get_GSOD.R:287:1
    R/get_GSOD.R:306:1
    R/get_GSOD.R:456:1
    ... and 9 more lines

Reviewers: @jeffreyhanson @deniederhut Due date: 2016-11-08

adamhsparks commented 7 years ago

Thanks for the valuable feedback. This is exactly the type of suggestions I was looking for here.

I'll get as much of this done as I can shortly.

adamhsparks commented 7 years ago

I've updated the DESCRIPTION file in the package and here in this issue as suggested.

adamhsparks commented 7 years ago

Vignette builds faster now, as suggested as well.

adamhsparks commented 7 years ago

Tomislav Hengl has contributed to the package and is now an author. I'll be merging his commits as I have time. However, I'll be on travel this week. So it may take me a while to address this.

sckott commented 7 years ago

Thanks for the updates @adamhsparks - seeking reviewers

jeffreyhanson commented 7 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 3


Review Comments

Overview

The Global Summary Of The Day dataset contains weather data collected from meteorological stations all over the planet. However, accessing and formatting this data for use in R can be rather tricky. The GSODR R package will make this a dream. This package will be incredibly useful for new R users who want easy access to this data set. It will also be useful for experienced R users who want to download data over large extents and do post-processing later in R. I very much look forward to using this package in the future.

I have several major concerns before I can recommend this package for release. First, the main function get_GSOD fails when the user wants to download data for an entire country. Personally, I think this is the biggest feature of this package. Second, the package could benefit from more automated testing. None of the examples are actually tested, and the unit tests only cover cases that are expected to fail. Third, the package only downloads data. It would be brilliant if the get_GSOD function returned data after completion. I have outlined these main issues and other issues below.

Major issues

  1. The get_GSOD function cannot download all data for an entire country. For instance, running the code get_GSOD(year=2015, country='NZ', dsn=tempdir()) returns this error:
Error in col[[i, exact = exact]] : 
 attempt to select more than one element in vectorIndex
  1. All examples are wrapped in \dontrun{} so they are not actually tested.
  2. The main function get_GSOD just saves data to disk and does not actually return data. It would be brilliant if this package returned GSOD data so the user does not have to then load it back into R. To that end, I would recommend the function returns a single tibble or SpatialPointsDataFrame object (perhaps depending on an argument?), and let the user then re-save the data where they want it stored.
  3. As far as I can tell, the vignette cannot be accessed within R. After installation, the following code vignette(package='GSODR') yields 'no vignettes found'. Additionally, the rmarkdown file for the vignette has an obscure name, so even if it could be opened within R, this would be tricky given its obscure name. I recommend naming the vignette after the package so the following code will open it vignette('GSODR').
  4. Unit testing covers cases where get_GSOD is expected to fail but it does not actually test cases that are expected to succeed.
  5. Missing data is represented using -9999. Why not use NA values? The main aim of this package is to load GSOD data into R.
  6. There is no help page for the package (ie. ?GSODR). It would be really useful to include this, even if it it just a copy-paste from the README. See here for more details.
  7. The nearest_station function will be incredibly useful for a user. It would be even more useful if--instead of printing the stations closest to a user-specified point--it returned the stations. Then this function could be used programmatically to find the closest stations to a series of points, and download data accordingly.
  8. The accompanying paper does not list authors in the html document.

Minor issues

  1. This package needs a rerun of devtools::document() before resubmission. When I ran devtools::run_examples(), this automatically re-compiled some Rd files.
  2. get_GSOD.R:20 & data.R:29: Both lines refer to a document available on the github repository. I recommend making this document an additional vignette.
  3. get_GSOD.R:279: I am unsure why foreach iterators are used here. It would be great to do offer some parallelisation and options for progress bars here. If you wrapped this in a plyr::**ply function, you could easily make it such that: (1) years can easily be processed in parallel, (2) if not processed in parallel, then a progress-bar can be shown.
  4. get_GSOD.R:322: ROpensci guidelines recommend using the httr package instead of the RCurl package.
  5. get_GSOD.R:392--393: Options are restored to the default state but this is not necessarily the previous state. These values could be cached at the beginning of the function. For example: use orginal_options <- options() at the start of the function, and at the end of the function use options(original_options).
  6. get_GSOD.R:345: the user should have explicit control over thread use to avoid accidently fork-bombing their own computer. I recommend providing a threads argument in the get_GSOD function and defaulting it to 1.
  7. get_GSOD.R:561: Here and elsewhere return(0) is used after a call to stop(). Please consider removing the return(0) as this is unnecessary .
  8. nearest_stations.R:28--29: This function changes options() and does not reset them after it finishes. This is against recommended CRAN practices. Please see previous comment on options for a suggestion on how to fix this.
  9. nearest_stations.R:34--44: This code could be rewritten for clarity and computational efficiency. I suggest something like this:
dists <- fields::rdist.earth(as.matrix(stations[c('LON', 'LAT')]), matrix(c(LON, LAT), ncol=2), miles=FALSE)
nearby <- which(dists[,1] < distance)

Typographic and stylistic issues

  1. get_GSOD.R:83--85: Documentation contains '&lt' which is not rendered as a less than symbol in the html documentation.
  2. get_GSOD.R:292: Instead of a call to utils::glob2rx, this could be used instead pattern='^.*\\.gz$'
  3. get_GOSD.R:631: This could rewritten be as !vs %in% stations[[12]] for clarity.
adamhsparks commented 7 years ago

Thanks, Jeff. I'm rather embarrassed that it didn't work for you.

I'll see what needs to be done to get it fixed and report back here when/as I do.

sckott commented 7 years ago

thanks for the review @jeffreyhanson ! a few notes:

sckott commented 7 years ago

2nd reviewer @deniederhut assigned

jeffreyhanson commented 7 years ago

@sckott Ok, thanks for correcting me on that. I just did a quick check and the nearest_stations function also has its example wrapped in a \dontrun{}. This function doesn't make web requests.

adamhsparks commented 7 years ago

@jeffreyhanson, I've fixed the bug with the country request in the devel branch. Good catch, THANK YOU!

Your comment about the nearest_stations is incorrect. It does make a web query with the .fetch_stations function, thus is wrapped in a \dontrun{}.

I'll work on some of the other issue's you've raised while I wait on my plane now.

adamhsparks commented 7 years ago

Add threads option to get_GSOD function for parallel processing … https://github.com/adamhsparks/GSODR/commit/d404cb4b43cd0fc40365472df5c7a95674e7aeb8

Fix Documentation contains '&lt' which is not rendered … https://github.com/adamhsparks/GSODR/commit/84c210f1079b9e10409d4536c537e993f443d265

Replace pattern call on line 292 … https://github.com/adamhsparks/GSODR/commit/5c622afaae19618d8213f32ec7aab338247231dc

Fix bug when ISO2c/ISO3c is set, query fails https://github.com/adamhsparks/GSODR/commit/434f71e26a5bef87335e145a2d631f329dd6688e

Thanks, @jeffreyhanson, boarding now.

adamhsparks commented 7 years ago

Update authors in paper.md https://github.com/adamhsparks/GSODR/commit/12f8683a9147b41aa7bbf76f88dffdfd90c3631c

adamhsparks commented 7 years ago

Edit code for clarity

get_GOSD.R:631: This could rewritten be as !vs %in% stations[[12]] for clarity.

https://github.com/adamhsparks/GSODR/commit/83728444ac456da8b021032696ed469b0b208ac7

adamhsparks commented 7 years ago

Check and reset original options()

get_GSOD.R:392--393: Options are restored to the default state but this is not necessarily the previous state. These values could be cached at the beginning of the function. For example: use orginal_options <- options() at the start of the function, and at the end of the function use options(original_options).

https://github.com/adamhsparks/GSODR/commit/960bb82e657b33a175cdac2e10d73fd755885bae

adamhsparks commented 7 years ago

Renamed vignette to "GSODR"

https://github.com/adamhsparks/GSODR/commit/a0713f6602ec31981773c7a3d675c0082b5412c7

Additionally, @jeffreyhanson, If you install from github using devtools, devtools::install(build_vignettes = TRUE), will build the vignette, so then using vignette(package="GSODR") a vignette is returned.

adamhsparks commented 7 years ago

Added GSODR.R to create helpfile for package

There is no help page for the package (ie. ?GSODR). It would be really useful to include this, even if it it just a copy-paste from the README.

https://github.com/adamhsparks/GSODR/commit/a4044943d284e0cd44f53670b21995c33ee4a1fb

adamhsparks commented 7 years ago

Removed return(0) after stop command

get_GSOD.R:561: Here and elsewhere return(0) is used after a call to stop(). Please consider removing the return(0) as this is unnecessary .

https://github.com/adamhsparks/GSODR/commit/9c4c24ea4d1221768b2083d86404fb29ce744959

adamhsparks commented 7 years ago

Updated nearest_stations function to return a list of stations for use in the get_GSOD function's stations parameter.

The nearest_station function will be incredibly useful for a user. It would be even more useful if--instead of printing the stations closest to a user-specified point--it returned the stations. Then this function could be used programmatically to find the closest stations to a series of points, and download data accordingly.

Also rewrote the function for computational efficiency.

nearest_stations.R:34--44: This code could be rewritten for clarity and computational efficiency.

https://github.com/adamhsparks/GSODR/commit/3a077575aaee54543ee4ee3aa93b9950c5355d80

deniederhut commented 7 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: Maybe 2? This was conducted in ~20 minute chunks over the last two weeks, which makes it a little hard to guesstimate.


Review Comments

> system.time(get_GSOD(years = 2010, country = "Philippines", dsn = "~/", filename = "PHL",
+          GPKG = TRUE, CSV = FALSE, threads = 2))
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2010/gsod_2010.tar'
Content type 'unknown' length 98222080 bytes (93.7 MB)
==================================================
downloaded 93.7 MB

Finished downloading file.

Parsing the indivdual station files now.

   user  system elapsed 
  4.391  11.714 147.535 
> 
> system.time(get_GSOD(years = 2010, country = "Philippines", dsn = "~/", filename = "PHL",
+          GPKG = TRUE, CSV = FALSE, threads = 1))
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2010/gsod_2010.tar'
Content type 'unknown' length 98222080 bytes (93.7 MB)
==================================================
downloaded 93.7 MB

Finished downloading file.

Parsing the indivdual station files now.

   user  system elapsed 
  4.350  18.334 147.915 
sckott commented 7 years ago

Thanks @jeffreyhanson and @deniederhut for your reviews! both reviews in now, @adamhsparks continue on, and let us know if you have any questions, want any feedback

adamhsparks commented 7 years ago

Thanks folks. Excellent reviews and points raised in them.

I've been working through the review of @jeffreyhanson and will work on @deniederhut's as well.

I think this process is well worth it, especially for someone like me that is just learning how to put a package together. This will make this a much better package than it otherwise would have been.

adamhsparks commented 7 years ago

@deniederhut, I did some checking on the parallelisation just today. It seems to be much faster in my testing. This is for two years, 2010:2011 and all stations in the data.

Using plyr

  system.time(GSOD_XY <- plyr::ldply(.data = GSOD_list, .fun = .process_gz,
                         stations = stations))

  user   system  elapsed
  1055.835   71.492 2921.688

In parallel using foreach dopar loop (4 cores):

  ity <- iterators::iter(GSOD_list)
  system.time(GSODY_XY2 <- foreach::foreach(i = ity) %dopar% {
    .process_gz(i, stations = stations)
  }
  )
   user  system elapsed 
 24.395   9.373 380.866 

This is strictly the data formatting, cleaning, etc. not the whole process, so I'll do more checking through the whole package, but for now I think it's useful.

adamhsparks commented 7 years ago

I've spent quite a bit of time working on this, I think it's ready for another round of review.

New features and enhancements included in this version, 1.0.0

I've decided to release this as 1.0.0 when the review is complete. I think it's finally a suitable product that meets the needs of researchers. Hopefully I've addressed the reviewers comments sufficiently.

In addition to addressing the reviewer comments, which I've addressed below:

Responses to reviewers

@jeffreyhanson's review

  1. All examples are wrapped in \dontrun{} so they are not actually tested.

Yes, since they require an active Internet connection and take longer to run than CRAN permits, they are wrapped in a \dontrun{} tag.

  1. The main function get_GSOD just saves data to disk and does not actually return data. It would be brilliant if this package returned GSOD data so the user does not have to then load it back into R. To that end, I would recommend the function returns a single tibble or SpatialPointsDataFrame object (perhaps depending on an argument?), and let the user then re-save the data where they want it stored.

A data.frame object is now returned as well as the option within the function to define a filename and where to store the file with two options: CSV and a spatial format GeoPackage file.

  1. Unit testing covers cases where get_GSOD is expected to fail but it does not actually test cases that are expected to succeed.

I've revised the unit testing to cover more of the functions within the package accordingly.

  1. There is no help page for the package (ie. ?GSODR). It would be really useful to include this, even if it it just a copy-paste from the README. See here for more details.

I've updated the documentation to include a help file that includes the main "description" and links to the GSODR functions and data documentation.

  1. The nearest_station function will be incredibly useful for a user. It would be even more useful if--instead of printing the stations closest to a user-specified point--it returned the stations. Then this function could be used programmatically to find the closest stations to a series of points, and download data accordingly.

I've updated this accordingly, I've included a more detailed vignette on how someone could use this function to find stations in a specific part of a country of interest.

  1. The accompanying paper does not list authors in the html document.

All of the author details are complete in the md file, however they don't appear in the HTML version when it is knit.

  1. This package needs a rerun of devtools::document() before resubmission. When I ran devtools::run_examples(), this automatically re-compiled some Rd files.

The documentation is updated and expanded.

  1. get_GSOD.R:20 & data.R:29: Both lines refer to a document available on the github repository. I recommend making this document an additional vignette.

There are two Rmd files that document data sources in this package. Technically this isn't any functionality offered by this package or related to using the package. It's documentation that is provided to detail how these elevation data were derived, so they are in the "data-raw" directory, per Hadley's R packages book/webpage. I do not feel that it is necessary to include these as vignettes since the end user does not need to run these functions and they are not functions provided by GSODR.

  1. As far as I can tell, the vignette cannot be accessed within R. After installation, the following code vignette(package='GSODR') yields 'no vignettes found'. Additionally, the rmarkdown file for the vignette has an obscure name, so even if it could be opened within R, this would be tricky given its obscure name. I recommend naming the vignette after the package so the following code will open it vignette('GSODR').

I've renamed the main vignette to "GSODR" and added two more, one detailing working with GeoPackage files and one with a more detailed look at how to find and download stations for a specific area of interest.

Additionally, when installing from GitHub use, devtools::install_github("adamhsparks/GSODR", build_vignettes = TRUE)

10a. get_GSOD.R:279: I am unsure why foreach iterators are used here. It would be great to do offer some parallelisation and options for progress bars here. If you wrapped this in a plyr::**ply function, you could easily make it such that: (1) years can easily be processed in parallel, (2) if not processed in parallel, then a progress-bar can be shown.

and

10b. get_GSOD.R:345: the user should have explicit control over thread use to avoid accidently fork-bombing their own computer. I recommend providing a threads argument in the get_GSOD function and defaulting it to 1.

Per @deniederhut's comment about parallelisation not improving the speed, I've tested and agree, parallelisation does not render the process significantly faster. I've removed this capability and wrapped these functions in plyr functions, which provides progress bars.

11a. get_GSOD.R:392--393: Options are restored to the default state but this is not necessarily the previous state. These values could be cached at the beginning of the function. For example: use orginal_options <- options() at the start of the function, and at the end of the function use options(original_options).

and

11b. nearest_stations.R:28--29: This function changes options() and does not reset them after it finishes. This is against recommended CRAN practices. Please see previous comment on options for a suggestion on how to fix this.

This has been fixed

  1. get_GSOD.R:322: ROpensci guidelines recommend using the httr package instead of the RCurl package.

I've updated this with assistance from @hrbrmstr, RCurl is no longer used.

  1. get_GSOD.R:561: Here and elsewhere return(0) is used after a call to stop(). Please consider removing the return(0) as this is unnecessary .

These have been removed

  1. nearest_stations.R:34--44: This code could be rewritten for clarity and computational efficiency. I suggest something like this:

This was rewritten as suggested

  1. get_GSOD.R:83--85: Documentation contains '&lt' which is not rendered as a less than symbol in the html documentation.

This has been corrected

  1. get_GSOD.R:292: Instead of a call to utils::glob2rx, this could be used instead pattern='^.*\.gz$'

This has been changed as suggested

  1. get_GOSD.R:631: This could rewritten be as !vs %in% stations[[12]] for clarity.

This has been changed as suggested

@deniederhut's review

  1. vignettes and browseVignettes function called on 'GSODR' returns no vignettes found

This has been fixed, when installing from GitHub use, devtools::install_github("adamhsparks/GSODR", build_vignettes = TRUE)

  1. it seems odd to me the enforce a missingness limit at download time -- I think I would either let the user do this in memory, or set the default limit to 0

This has been changed to not enforce any limit with the user having the option to set missing values as they wish.

  1. the test suite only seems to be concerned with handling bad input - am I missing the blocks where it is testing that data are returned correctly?

The unit tests have been expanded

  1. allowing multiple threads for calculation doesn't speed up the execution time as much as I would expect:

I tested this and found the same as well. Parallelisation has been removed to simplify the package dependencies and structure

Thanks to the reviewers for their excellent contributions and comments. It's helped to make this package a much more robust and useful package.

sckott commented 7 years ago

Thanks @adamhsparks for the changes and the responses to reviewers

@jeffreyhanson @deniederhut do either or both of you have time to take another look? if not, i'll do so. let me know

deniederhut commented 7 years ago

@sckott Yes, but not until after Christmas. Is that timeline okay?

On Fri, Dec 2, 2016, 03:22 Scott Chamberlain notifications@github.com wrote:

Thanks @adamhsparks https://github.com/adamhsparks for the changes and the responses to reviewers

@jeffreyhanson https://github.com/jeffreyhanson @deniederhut https://github.com/deniederhut do either or both of you have time to take another look? if not, i'll do so. let me know

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/79#issuecomment-264433061, or mute the thread https://github.com/notifications/unsubscribe-auth/AGjNrEGh_C7fEX9pYjukX5J7fAG6TSuKks5rD_9XgaJpZM4KWnlq .

adamhsparks commented 7 years ago

I'll probably not be available to do much until after Christmas either. So I'm in no hurry at this point.

sckott commented 7 years ago

thanks @deniederhut

jeffreyhanson commented 7 years ago

I'm pretty busy until Christmas too, so I'll take a look early in the new years.

sckott commented 7 years ago

thanks @jeffreyhanson

sckott commented 7 years ago

@jeffreyhanson @deniederhut if you get a chance soon have a look at submitter replies above and let us know if there's anything else you'd like changed/improved

deniederhut commented 7 years ago

@sckott It's on my to-do list for this weekend.

adamhsparks commented 7 years ago

Update. I've made a minor change to the package that does not affect the functionality.

Due to the size of the initially added climate data and the addition of yet more data by myself just this week, the package size was 5.1Mb. I've moved the additional climate data to its own R package that's available only from GitHub, https://github.com/adamhsparks/GSODR.data.

This is referenced in the README and a vignette.

sckott commented 7 years ago

okay, you do plan to put it on CRAN though since I imagine this will go to CRAN.

adamhsparks commented 7 years ago

GSODR is on CRAN and I'll update it once this review is done. The other package, GSODR.data is too large for CRAN unless it's split into two packages or I remove some of the data sets.

I'm happy to just leave it on GitHub since it's not essential for the use of GSODR.

Thoughts?

sckott commented 7 years ago

Ah, if not required for GSODR, then all good,

adamhsparks commented 7 years ago

I guess maybe I wasn't initially clear. It's four sources of climate data that are formatted for use with the data GSODR provides. You can link these data to GSOD data by station ID and LAT/LON.

I have one example in a vignette using a one of the data sets, but for normal use it's not required. It just makes this data more available for use with GSOD data output produced by GSODR.

So since it's mentioned in a vignette, does that require a CRAN version?

edit I could put just the dataset that's referenced in the GSODR package itself and leave the other three as the optional download from GitHub. Is that a good option?

deniederhut commented 7 years ago

Okay, vignettes look good! I think I would still like to see some explicit correctness check in the unit tests somewhere, perhaps with a cached result, but I am recommending this package for approval. A few bugs I ran into while testing:

  1. Call to get_GSOD(years = 2010, station = "955510-99999") raises namespace error -- Error: 'safely' is not an exported object from 'namespace:purrr' ... probably in line 490 or 507
  2. Not sure if this is GSODR or my machine, but tab completing the function names sometimes throws this:
Error in fetch(key) :
  lazy-load database '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/GSODR/help/GSODR.rdb' is corrupt
  1. Not super important, but max_missing seems to accept negative values and NA
adamhsparks commented 7 years ago

Thanks, I have fixed # 3 easily. The other two I'm not so sure of, I've never run into either one with all my testing.

  1. I don't know, it's definitely in purrr 0.2.2. I've never had that issue. If someone has any suggestions?
  2. I ran to Google and did a quick search and found another RStudio user reporting this issue with all packages and someone on SO who had an issue with Java, but I can't find any fixes.
jeffreyhanson commented 7 years ago

Hi, thanks for the reminder. I'll take a look at this sometime this coming week.

sckott commented 7 years ago

its possible you have @deniederhut purrr v0.1, since safely is not in that version https://github.com/hadley/purrr/blob/v0.1.0/NAMESPACE - @adamhsparks plz add a minimum version of purrr to your DESCRIPTION file

sckott commented 7 years ago

lazy-load database ... is corrupt

this is solved by restarting R usually

sckott commented 7 years ago

thanks @jeffreyhanson

adamhsparks commented 7 years ago

@sckott, I've added the minimum version of purrr to DESCRIPTION. Thanks for the tip.

jeffreyhanson commented 7 years ago

I'm still on holiday in New Zealand and won't be back till the 18th. I was hoping that the local library's internet would be good enough to let me test the package, but that doesn't seem to be the case.

Would you mind if I submitted my review when I'm back home?

Also, I noticed that the DESCRIPTION is missing a Maintainer field.

adamhsparks commented 7 years ago

@jeffreyhanson, I'm following Hadley's preference, using Authors@R

You can also use separate Maintainer and Author fields. I prefer not to use these fields because Authors@R offers richer metadata.

jeffreyhanson commented 7 years ago

@adamhsparks Ok, sounds good.