ropensci / rnoaa

R interface to many NOAA data APIs
https://docs.ropensci.org/rnoaa
Other
328 stars 84 forks source link

Fix for issue #327, adjust data request date and fix column name #328

Closed potterzot closed 4 years ago

potterzot commented 4 years ago

Description

Two issues arose in the latest CRAN submission:

  1. The tests called GEFS data from the current date based on Sys.time() with a forecast time of '0000'. As a result if the tests were run before '1800', the data file wouldn't be available yet and some tests would fail. This PR fixes that by changing the GEFS request to query a forecast made two days ago, which should always be available regardless of what time the tests are run.
  2. Previous fixes had made large improvements to the indexing of the ens and time dimensions, with the result that the request returned a data.frame with time as a column, whereas previously the specific requested variable was returned (either time1 or time2). The test still tested for variables under those names instead of under the time name. Not sure how this passed the last round of tests but all tests now pass.

Related Issue

Fix #327

sckott commented 4 years ago

thanks @potterzot however, a few things:

potterzot commented 4 years ago

Thanks @sckott. The file connections were not being closed properly during error checks, leaving behind temporary files. I think all of that is fixed now, and I've added a test to check the list of files in the temporary directory before testing against the list afterward and fail if they are not equal. It works on my linux system, but I'm not sure about other systems. I think it should work since I base the temp directory on the output of tempdir().

For the other test, I am confused about the time, time1, and time2 variables, and I'm wondering if they actually change on the GEFS side of things, but I've reworked the tests to fetch the correct dimension name and test against it with a dimension that is never included. I think it should work on any system.

sckott commented 4 years ago

thanks - i'll have a look

sckott commented 4 years ago

thanks much @potterzot

Files are now cleaned up on function exit, thats great, thanks. however, still seems like CRAN team may not be happy where files are being written. they say

Packages should not write in the user’s home filespace (including clipboards), nor anywhere else on the file system apart from the R session’s temporary directory (or during installation in the location pointed to by TMPDIR

on my macos, files are being written to /private/tmp/. my tempdir() gives /var/folders/fc/n7g_vrvn0sx_st0p8lxb3ts40000gn/T//RtmpvxX8rj, so at least on macos gefs isn't using the appropriate tempdir. I assume ncdf4 controls where the files are going? is there any way to change the base directory? if you can control the path where files are written, you could then expose that as a parameter - and then users can use whatever path they want, persistent or not, and you can in examples and tests use a temp directory that cran will be happy with

potterzot commented 4 years ago

Hmm, I'm not sure what is going on here. ncdf4 is itself on CRAN, so it seems weird that it would do this. But it was updated on 10-23-2019, so maybe it just hasn't been caught yet? And previously the gefs functions in rnoaa didn't trigger this problem. I'll contact the package maintainer. The function to generate the connection (nc_open()) doesn't have an option to specify the temporary directory unfortunately.

sckott commented 4 years ago

Okay, yeah they're not consistent probably in checking.

We're probably okay as long as the files are not left anywhere on disk. if you don't get a response by monday or tuesday we should move on as we need to get it in by the end of next week

potterzot commented 4 years ago

I received an email from the ncdf4 package author](http://cirrus.ucsd.edu/~pierce/ncdf/). That package does not specify any creation of a temporary file and does not provide a flag to pass on to the underlying netcdf library. Moreover, the netcdf header file "netcdf.h" doesn't seem to allow that as a parameter to it's C function nc_open(). I sent an email to the netcdf email list at unidata, and will see what I get back from it. In the meantime though, it seems there's no way for us to prevent the creation of the temp file in the system temporary directory.

Hopefully this is acceptable by CRAN, since it's not an R package that is doing the file creation and if we're careful to close file connections so that the temporary files get deleted afterward. If not, we may have to remove gefs.R for the time being...

potterzot commented 4 years ago

Just to follow up for future/posterity, I received this email regarding how to specify the tempfile location:

That is correct. You need to create a .dodsrc file and insert HTTP.COOKIEJAR= into it. =Dennis Heimbigner Unidata

Since we'd have to create the .dodsrc file in the users directory, I think we're faced with either creating a non-standard config file to tell netcdf where to store the temporary file, or accepting the standard location of the temporary file based on system temporary files. Neither is great but both cases create files outside of the standard R tempdir(). I don't see a way around this at this point. If you think it's worth doing, we could note in the documentation that this happens and that the user can create their own .dodsrc file to control the location of the temporary file, something like:

library(rnoaa)

# Create the config file to tell netcdf where to store temporary files
cat(paste0("HTTP.COOKIEJAR=", tempdir()), file="~/.dodsrc")

# Make data request
d <- gefs(...)

# Remove the config file if desired
file.remove("~/.dodsrc")
sckott commented 4 years ago

thanks @potterzot for looking into the netcdf file location details!

Let's think about in future versions if we can do something about the temp files, but for now let's merge this, and submit to cran, and if they're still not happy then we can explore letting users change the location of the ncdf temp files, etc.