ropensci / rnoaa

R interface to many NOAA data APIs
https://docs.ropensci.org/rnoaa
Other
330 stars 84 forks source link

search_coops() times missing from data$t for requests including leap-forward days (2017-03-12, 2016-03-13) #215

Closed tphilippi closed 5 years ago

tphilippi commented 7 years ago

Title should probably change to reflect DST spring-forward connection.

The CRAN version of coops_search() returns a data.frame with only dates, not dates & times, for requests to the NOAA Tides & Currents API that include 20170312 for station 9410230 (La Jolla CA SIO Pier). The same issue happens with 20160313 (last year's spring forward date). Compare (works):

chunk_temp <- coops_search(begin_date = 20170313, 
                              end_date = 20170313,
                              station_name = '9410230',
                              product = 'water_temperature', 
                              datum = 'MLLW',
                              units = 'metric', time_zone = 'gmt',
                              application = 'NPS-I&M')$data
str(chunk_temp)
data.frame':   240 obs. of  3 variables:
 $ t: POSIXct, format: "2017-03-13 00:00:00" "2017-03-13 00:06:00" "2017-03-13 00:12:00" "2017-03-13 00:18:00" ...
 $ v: num  16.5 16.5 16.5 16.5 16.4 16.4 16.5 16.5 16.5 16.4 ...
 $ f: chr  "0,0,0" "0,0,0" "0,0,0" "0,0,0" ...

To (no times returned):

chunk_tide <- coops_search(begin_date = 20170312, 
                              end_date = 20170312,
                              station_name = '9410230',
                              product = 'water_level', 
                              datum = 'MLLW',
                              units = 'metric', time_zone = 'gmt',
                              application = 'NPS-I&M')$data
str(chunk_tide)
data.frame':   240 obs. of  5 variables:
 $ t: POSIXct, format: "2017-03-12" "2017-03-12" "2017-03-12" "2017-03-12" ...
 $ v: num  0.065 0.083 0.114 0.139 0.17 0.21 0.232 0.272 0.305 0.343 ...
 $ s: chr  "0.065" "0.091" "0.066" "0.080" ...
 $ f: chr  "0,0,0,0" "0,0,0,0" "0,0,0,0" "0,0,0,0" ...
 $ q: chr  "v" "v" "v" "v" ...

In the second result, t has only date components, not times. Requests for multiple days of data are missing times for all days if 2017-03-12 is included in the interval requested. The same behavior occurs for station 9410170, and for all 6 stations (CA & FL) that I checked.

Hypothesis 1: Note that local time at these stations went to DST early the morning of 2017-03-12, but the calls are in 'gmt' to avoid that issue. Is there something in noaa_compact() that can be tripped up by DST even using gmt? The same dropped times occurs for 2016 spring forward 2016-03-13:

chunk_tide <- coops_search(begin_date = 20160313, 
                              end_date = 20160313,
                              station_name = '9410230',
                              product = 'water_level', 
                              datum = 'MLLW',
                              units = 'metric', time_zone = 'gmt',
                              application = 'NPS-I&M')$data
str(chunk_tide)
> str(chunk_tide)
'data.frame':   240 obs. of  5 variables:
 $ t: POSIXct, format: "2016-03-13" "2016-03-13" "2016-03-13" "2016-03-13" ...
 $ v: num  0.307 0.324 0.297 0.284 0.253 0.238 0.231 0.247 0.222 0.22 ...
 $ s: chr  "0.292" "0.317" "0.333" "0.287" ...
 $ f: chr  "0,0,0,0" "0,0,0,0" "0,0,0,0" "0,0,0,0" ...
 $ q: chr  "v" "v" "v" "v" ...

Times come through fine for fall back 2016-11-06 (not shown).

Hypothesis 2: Perhaps the API is returning a non-printing character somewhere in the date, and a call to as.POSIXct() in noaa_compact() sees that character and doesn't parse the time components. [This is less likely now that I've tested multiple sites, and found the same behavior for 2016-03-13.] When I compose the url for data in .csv format I get data that reads fine in R:

tw2 <- read.csv('https://tidesandcurrents.noaa.gov/api/datagetter?begin_date=20170312&end_date=20170312&station=9410230&product=water_level&datum=mllw&units=metric&time_zone=gmt&application=web_services&format=csv', as.is=TRUE)
tw2$DT <- as.POSIXct(tw2$Date.Time,format='%Y-%m-%d %H:%M')
tw2$DT2 <- as.POSIXct(tw2$Date.Time,format='%Y-%m-%d %R')
str(tw2)
> str(tw2)
'data.frame':   240 obs. of  10 variables:
 $ Date.Time  : chr  "2017-03-12 00:00" "2017-03-12 00:06" "2017-03-12 00:12" "2017-03-12 00:18" ...
 $ Water.Level: num  0.065 0.083 0.114 0.139 0.17 0.21 0.232 0.272 0.305 0.343 ...
 $ Sigma      : num  0.065 0.091 0.066 0.08 0.081 0.073 0.075 0.093 0.108 0.116 ...
 $ O          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ F          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ R          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ L          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Quality    : chr  "v" "v" "v" "v" ...
 $ DT         : POSIXct, format: "2017-03-12 00:00:00" "2017-03-12 00:06:00" "2017-03-12 00:12:00" "2017-03-12 00:18:00" ...
 $ DT2        : POSIXct, format: "2017-03-12 00:00:00" "2017-03-12 00:06:00" "2017-03-12 00:12:00" "2017-03-12 00:18:00" ...

I can't do much more on this for another week or 2.
I haven't figured out where noaa_compact() is, and more critically, I can't manually test the NOAA API with json & xml results directly into R from work, as our firewall mangles certificates.

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] jsonlite_1.4         ggplot2_2.2.1        lubridate_1.6.0      plyr_1.8.4           rnoaa_0.6.6          devtools_1.12.0     
[7] httr_1.2.1           foreign_0.8-67       BiocInstaller_1.26.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10     xml2_1.1.1       knitr_1.15.1     magrittr_1.5     rappdirs_0.3.1   munsell_0.4.3    colorspace_1.3-2 R6_2.2.0        
 [9] stringr_1.2.0    dplyr_0.5.0      tools_3.4.0      grid_3.4.0       gtable_0.2.0     DBI_0.6-1        git2r_0.18.0     withr_1.0.2     
[17] lazyeval_0.2.0   assertthat_0.2.0 digest_0.6.12    tibble_1.3.0     gridExtra_2.2.1  tidyr_0.6.1      curl_2.5         memoise_1.1.0   
[25] stringi_1.1.5    compiler_3.4.0   scales_0.4.1     XML_3.98-1.6    
sckott commented 7 years ago

Title should probably change to reflect DST spring-forward connection.

want me to change the title?

tphilippi commented 7 years ago

Please do. If there is a way for me to change it, I don't know what it is. I've also tried time_zone = "GMT" instead of lower case, but that didn't help.

I might be able to get to this next week if you haven't figured out the problem before then.

sckott commented 7 years ago

tell me what title you want and i'll put it in, though i think you should be able to hit the Edit button above and do it (if you open an issue you should have abilitty to change tht title)

sckott commented 7 years ago

thanks, having a look soon

sckott commented 5 years ago

@tphilippi sorry about delay on this:

noaa_compact() is defined as function(l) Filter(Negate(is.null), l) - it's only used to filter out empty/zero length arguments given by the user. So it shouldn't affect the actual inputs themselves.

these don't seem to be a problem anymore - i can't replicate the issue.

your 1st eg returns no data

chunk_temp <- coops_search(begin_date = 20170313, 
                              end_date = 20170313,
                              station_name = '9410230',
                              product = 'water_temperature', 
                              datum = 'MLLW',
                              units = 'metric', time_zone = 'gmt',
                              application = 'NPS-I&M')$data

your 2nd example returns times with the dates

head(coops_search(begin_date = 20170312, 
                              end_date = 20170312,
                              station_name = '9410230',
                              product = 'water_level', 
                              datum = 'MLLW',
                              units = 'metric', time_zone = 'gmt',
                              application = 'NPS-I&M')$data)
#>                     t     v     s       f q
#> 1 2017-03-12 00:00:00 0.058 0.069 0,0,0,0 v
#> 2 2017-03-12 00:06:00 0.079 0.089 0,0,0,0 v
#> 3 2017-03-12 00:12:00 0.112 0.069 0,0,0,0 v
#> 4 2017-03-12 00:18:00 0.145 0.075 0,0,0,0 v
#> 5 2017-03-12 00:24:00 0.163 0.074 0,0,0,0 v
#> 6 2017-03-12 00:30:00 0.198 0.075 0,0,0,0 v

as does the 3rd example.

sckott commented 5 years ago

closing - @tphilippi reopen if this is still an issue