Closed geanders closed 8 years ago
Yep, this whole family of functions is just ftp-ing to ".txt" files, so extra things like date restrictions or variable restrictions that you're trying to pass through I'm pretty sure are just being ignored. It's going to give you the same file no matter what you do. In our fork, I'll take those arguments out of the documentation for the ghcnd
family.
There's multiple functions in that file https://github.com/ropensci/rnoaa/blob/master/R/ghcnd.R - date_min
and date_max
are used in ghcnd_search
i'm pretty sure. the problem is that ghcnd()
has a ...
param to allow passing curl options to httr::GET
and so if you pass e.g., date_min
to ghcnd()
it is silently ignored
Got it-- I'd missed that looking through those functions. I've added those parameter definitions back into the documentation on our fork. Would it be okay, though, if I re-worked the documentation in that section to have separate documentation for each function? I think a lot of users could get confused with thinking that they can use some of those arguments with all the functions. Or would that be too much of a pain with maintenance of the package to have separate documentation for each function?
Also, the ghcnd_search
function is very cool. We hadn't figured out to use that as a way to pull data when we were working on stuff today, so I think some of our stuff was replicating some of that (we were working off of ghncd
rather than ghncd_search
). I might go back and look to see if I could mostly replace the clean_daily
function we came up with with ghncd_search
, and then just reformat a bit to get it into the dataframe structure, rather than list, that we were aiming for in our group today to make sure output "plays well" with ggplot and dplyr.
Would it be okay, though, if I re-worked the documentation in that section to have separate documentation for each function?
yes
I might go back and look to see if I could mostly replace the clean_daily function we came up with with ghncd_search ...
sounds good!
Okay, I have the clean_daily
function in our fork changed now so it's using ghncd_search
for most of the heavy lifting, and then converting the output into a large tidy dataframe to make it easier to work with the results with dplyr
, ggplot2
, etc. We've then got meteo_pull_monitors
that will take a whole list of site ids and pull and tidy all weather data for selected weather variables (var
argument) over a selected date range, with the option to either keep or discard all the extra flags that come in the original dataset for each weather variable. The the meteo_coverage
function checks through that dataset and summarizes coverage specifically for the date range you want (I think the "coverage" that you get back for a weather station from the NCDC API might be for the whole period the station's online, not just for the dates a user wants to pull).
I'm still working out a few bugs and working on documentation, but I'll let you know as we work that out. I think we're planning on continuing work (at least some of us) as part of the Australian ROpenSci coming up soon.
Would it be okay, though, if I re-worked the documentation in that section to have separate documentation for each function?
I've done this. I also just merged your master branch's changes into the ropenscilabs
branch I've been working on (earlier today). I have the ghcnd.R
file, where I've edited documentation, ready for you to look over and pull what you want back into the master branch. I almost have another file, helpers_ghcnd.R
, ready for you to consider pulling in, as well, and since in the documentation for a few functions in ghcnd.R
I have @seealso's for functions in helpers_ghcnd.R
, those two files should probably be looked at and, if you're happy with them, pulled together.
Other files we added in our fork are going to take me a bit more work to get ready. If I do a pull request, is it possible for you to just consider and pull new R scripts one at a time, or is it not worth me putting in a pull request until everything's ready?
According to the documentation,
ghcnd
can take the argumentsdate_min
anddate_max
to restrict what date range of data it pulls. However, this doesn't do anything-- regardless of what you put for this, you will always get the full available dataset for that station (i.e., observations for all available dates). Same thing with thevar
argument. Regardless of your choice, you always get back the full dataset. It looks like theGET
function inghcnd_GET
doesn't recognize any of those extra options that are being passed in. It may be clearer for the users if these were not presented as possible arguments.I'll check through the rest of the
ghcnd
functions and see if those arguments make it through for any of them. If not, maybe they could be removed.