ropensci / rnoaa

R interface to many NOAA data APIs
https://docs.ropensci.org/rnoaa
Other
330 stars 84 forks source link

Some arguments in `ghcnd` don't do anything #149

Closed geanders closed 8 years ago

geanders commented 8 years ago

According to the documentation, ghcnd can take the arguments date_min and date_max to restrict what date range of data it pulls. However, this doesn't do anything-- regardless of what you put for this, you will always get the full available dataset for that station (i.e., observations for all available dates). Same thing with the var argument. Regardless of your choice, you always get back the full dataset. It looks like the GET function in ghcnd_GET doesn't recognize any of those extra options that are being passed in. It may be clearer for the users if these were not presented as possible arguments.

I'll check through the rest of the ghcnd functions and see if those arguments make it through for any of them. If not, maybe they could be removed.

geanders commented 8 years ago

Yep, this whole family of functions is just ftp-ing to ".txt" files, so extra things like date restrictions or variable restrictions that you're trying to pass through I'm pretty sure are just being ignored. It's going to give you the same file no matter what you do. In our fork, I'll take those arguments out of the documentation for the ghcnd family.

sckott commented 8 years ago

There's multiple functions in that file https://github.com/ropensci/rnoaa/blob/master/R/ghcnd.R - date_min and date_max are used in ghcnd_search i'm pretty sure. the problem is that ghcnd() has a ... param to allow passing curl options to httr::GET and so if you pass e.g., date_min to ghcnd() it is silently ignored

geanders commented 8 years ago

Got it-- I'd missed that looking through those functions. I've added those parameter definitions back into the documentation on our fork. Would it be okay, though, if I re-worked the documentation in that section to have separate documentation for each function? I think a lot of users could get confused with thinking that they can use some of those arguments with all the functions. Or would that be too much of a pain with maintenance of the package to have separate documentation for each function?

Also, the ghcnd_search function is very cool. We hadn't figured out to use that as a way to pull data when we were working on stuff today, so I think some of our stuff was replicating some of that (we were working off of ghncd rather than ghncd_search). I might go back and look to see if I could mostly replace the clean_daily function we came up with with ghncd_search, and then just reformat a bit to get it into the dataframe structure, rather than list, that we were aiming for in our group today to make sure output "plays well" with ggplot and dplyr.

sckott commented 8 years ago

Would it be okay, though, if I re-worked the documentation in that section to have separate documentation for each function?

yes

I might go back and look to see if I could mostly replace the clean_daily function we came up with with ghncd_search ...

sounds good!

geanders commented 8 years ago

Okay, I have the clean_daily function in our fork changed now so it's using ghncd_search for most of the heavy lifting, and then converting the output into a large tidy dataframe to make it easier to work with the results with dplyr, ggplot2, etc. We've then got meteo_pull_monitors that will take a whole list of site ids and pull and tidy all weather data for selected weather variables (var argument) over a selected date range, with the option to either keep or discard all the extra flags that come in the original dataset for each weather variable. The the meteo_coverage function checks through that dataset and summarizes coverage specifically for the date range you want (I think the "coverage" that you get back for a weather station from the NCDC API might be for the whole period the station's online, not just for the dates a user wants to pull).

I'm still working out a few bugs and working on documentation, but I'll let you know as we work that out. I think we're planning on continuing work (at least some of us) as part of the Australian ROpenSci coming up soon.

geanders commented 8 years ago

Would it be okay, though, if I re-worked the documentation in that section to have separate documentation for each function?

I've done this. I also just merged your master branch's changes into the ropenscilabs branch I've been working on (earlier today). I have the ghcnd.R file, where I've edited documentation, ready for you to look over and pull what you want back into the master branch. I almost have another file, helpers_ghcnd.R, ready for you to consider pulling in, as well, and since in the documentation for a few functions in ghcnd.R I have @seealso's for functions in helpers_ghcnd.R, those two files should probably be looked at and, if you're happy with them, pulled together.

Other files we added in our fork are going to take me a bit more work to get ready. If I do a pull request, is it possible for you to just consider and pull new R scripts one at a time, or is it not worth me putting in a pull request until everything's ready?