ropensci / rnoaa

R interface to many NOAA data APIs
https://docs.ropensci.org/rnoaa
Other
330 stars 84 forks source link

ncdc returns values that are too high #266

Closed sarahgrogan closed 5 years ago

sarahgrogan commented 6 years ago
Session Info ```r SessionInfo() R version 3.2.1 (2015-06-18) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] bindrcpp_0.2 XML_3.98-1.4 raster_2.5-8 sp_1.2-5 doMC_1.3.4 iterators_1.0.8 foreach_1.4.3 scales_0.4.1 [9] rnoaa_0.7.0 broom_0.4.2 asremlPlus_2.0-12 Amelia_1.7.4 Rcpp_0.12.10 tidyr_0.6.1 myf_1.0 plyr_1.8.4 [17] BCS.PhenoData_1.3.78 BCS.Base_1.5.27 BCS.Generics_1.0.21 BCS.Data_1.0.26 asreml_3.0-1 lattice_0.20-33 dplyr_0.7.4 stringr_1.2.0 [25] openxlsx_3.0.0 doBy_4.5-15 ggplot2_2.2.1 prettydoc_0.2.0 loaded via a namespace (and not attached): [1] httr_1.1.0 jsonlite_0.9.20 assertthat_0.1 triebeard_0.3.0 urltools_1.7.0 pillar_1.1.0 backports_1.0.5 glue_1.2.0 digest_0.6.12 [10] colorspace_1.3-2 htmltools_0.3.5 Matrix_1.2-6 psych_1.7.3.21 pkgconfig_2.0.1 httpcode_0.2.0 purrr_0.2.2 tibble_1.4.2 dae_2.7-20 [19] lazyeval_0.2.0 cli_1.0.0 mnormt_1.5-4 magrittr_1.5 crayon_1.3.4 mime_0.4 evaluate_0.10 nlme_3.1-128 MASS_7.3-45 [28] xml2_0.1.2 foreign_0.8-66 tools_3.2.1 munsell_0.4.3 compiler_3.2.1 rlang_0.1.6 grid_3.2.1 rstudioapi_0.5 rappdirs_0.3.1 [37] labeling_0.3 rmarkdown_1.4 gtable_0.2.0 codetools_0.2-14 curl_3.2 reshape2_1.4.2 R6_2.2.0 gridExtra_2.2.1 lubridate_1.6.0 [46] knitr_1.15.1 utf8_1.1.3 bindr_0.1 rprojroot_1.2 hoardr_0.2.0 stringi_1.1.5 crul_0.6.0 ```

I suspect a bug with ncdc().

The values returned are much higher than expected/reasonable (e.g., temps < 200 for one station in Lubbock, TX). I also pulled the same data set using NOAA's manual query, which had different & correct data, temps closer to 95F. I looked at TMIN, TMAX, and PRCP and all three were inflated for three sample stations. Here is one station.

x2 <- ncdc(datasetid = "GHCND", 
           stationid = "GHCND:USW00023042", 
           datatypeid=c("TMIN", "TMAX", "PRCP"),
           startdate = "2018-01-01", enddate = "2018-07-15", 
           limit=1000)
ggplot(aes(x=date, y=value), data=x2$data) + geom_point(aes(col=datatype)) 

I haven't looked at your functions, or dug into this too much.

sckott commented 6 years ago

thanks very much for this @sarahgrogan !

we've had related questions, e.g. #233 - in that issue, we found out that TMAX and TMIN are in 1/10 of degrees C, so for your example I'd do:

# fxn to convert C to F
c2f <- function(temp_celsius, round = 2) {
  temp_fahrenheit <- (9/5) * temp_celsius + 32
  round(temp_fahrenheit, digits = round)
}

the example

library(rnoaa)
library(ggplot2)
x2 <- ncdc(datasetid = "GHCND", 
           stationid = "GHCND:USW00023042", 
           datatypeid=c("TMIN", "TMAX"),
           startdate = "2018-01-01", enddate = "2018-07-15", 
           limit=1000)
x2$data$value <- c2f(x2$data$value / 10)
ggplot(aes(x=date, y=value), data=x2$data) + 
  geom_point(aes(col=datatype))
screen shot 2018-07-23 at 11 57 51 am
sckott commented 6 years ago

wrt how we can improve the situation:

  1. we need to improve documentation so it's clear to users what units data is in. the hard part about this is that NOAA's docs are quite difficult to navigate and find. but we'll do our best
  2. possibly convert data for users. I think i'd rather not do this, but instead make it very clear what units data is in, because not everyone will want data in e.g, degrees F, or degrees C
sarahgrogan commented 6 years ago

Thanks @sckott! I didn't realize this was a known issue with the data table, and now my data makes much more sense (also, that it's not a bug in the code). Looking back at it, and reading the NOAA documentation more closely, I see that it does state that the stored units are in 1/10 of a degree F. One way you might consider improving the functionality, would be to have ncdc() also return a summary table of what the reported units are for each measurement, or to add this to the function's documentation.

sckott commented 6 years ago

@sarahgrogan right, we will improve docs. that's a good idea, not sure if possible yet, to show units, which should make it very clear to users they need to convert to whatever units they want,