ropensci / rnoaa

R interface to many NOAA data APIs
https://docs.ropensci.org/rnoaa
Other
330 stars 84 forks source link

lcd returning different vector types #396

Closed mps9506 closed 3 years ago

mps9506 commented 3 years ago

Hi, Thanks for the great package... I'm running into a minor issue where lcd() sometimes returns a character column and sometimes returns a numeric column. Reprex below:

library(rnoaa)
x.1 <- lcd(station = "74746003904", year = 2010)
x.2 <- lcd(station = "74746003904", year = 2011)

str(x.1$source)
chr [1:11154] "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" ...
str(x.2$source)
int [1:11278] 7 7 7 7 7 7 7 7 7 7 ...

In the echor package which downloads various EPA environmental permit data, we ended up defining all the column classes as character when reading in the downloaded csv/json files to facilitate bulk data downloading (which is essentially what I'm using rnoaa for). I recognize that might not be a desired solution so completely understand if this issue is closed as is.

Session Info

R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_1.1.3        purrr_0.3.4        dplyr_1.0.5        readr_1.4.0        rnoaahelpers_0.1.0 rnoaa_1.3.4       

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        pillar_1.6.0      compiler_4.0.5    prettyunits_1.1.1 progress_1.2.2    tools_4.0.5      
 [7] digest_0.6.27     jsonlite_1.7.2    lubridate_1.7.10  lifecycle_1.0.0   tibble_3.1.1      gtable_0.3.0     
[13] pkgconfig_2.0.3   rlang_0.4.11      DBI_1.1.1         cli_2.5.0         rstudioapi_0.13   crul_1.1.0       
[19] curl_4.3.1        gridExtra_2.3     withr_2.4.2       xml2_1.3.2        fauxpas_0.5.0     hms_1.0.0        
[25] rappdirs_0.3.3    generics_0.1.0    vctrs_0.3.8       triebeard_0.3.0   rprojroot_2.0.2   grid_4.0.5       
[31] prompt_1.0.1      tidyselect_1.1.1  data.table_1.14.0 glue_1.4.2        httpcode_0.3.0    here_1.0.1       
[37] R6_2.5.0          fansi_0.4.2       XML_3.99-0.6      sessioninfo_1.1.1 whisker_0.4       hoardr_0.5.2     
[43] ggplot2_3.3.3     magrittr_2.0.1    urltools_1.7.3    scales_1.1.1      ps_1.6.0          ellipsis_0.3.2   
[49] assertthat_0.2.1  colorspace_2.0-0  utf8_1.2.1        munsell_0.5.0     crayon_1.4.1    
sckott commented 3 years ago

Thanks for the report @mps9506 !

Definitely should be fixed, results should be consistent types regardless of the data retrieved.

sckott commented 3 years ago

the source column should be character - see https://www1.ncdc.noaa.gov/pub/data/ish/ish-format-document.pdf It can be a number of a letter, so we shouldn't coerce to numeric/integer

reinstall to get the fix, let me know.

there's a lot of columns in the returned data. I've made a fix just for the source column, but there are likely others where they could sometimes be coerced to numeric/integer instead of what they should be (character), that's what happened in this case because the data can be a letter, so should always be character. if you or anyone else wants to help sort out what class columns should be that would be a great contribution

mps9506 commented 3 years ago

Thanks @sckott, the fix looks fairly straight forward, just slightly tedious. I'll work on a pull request to sort that out over the next few weeks.

sckott commented 3 years ago

Thanks!