ropensci / GSODR

API Client for Global Surface Summary of the Day (GSOD) Weather Data Client in R
https://docs.ropensci.org/GSODR
Other
90 stars 15 forks source link

Unexpected `NA`s in longitude and latitude using `reformat_GSOD` #109

Closed meixilin closed 1 year ago

meixilin commented 1 year ago

Hi,

thanks for making this package available. I was trying to use the reformat_GSOD function but noticed that some latitude and longitude were converted to NA unexpectedly.

Session Info ```r devtools::session_info() ─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.6.2 (2019-12-12) os CentOS Linux 7 (Core) system x86_64, linux-gnu ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/Los_Angeles date 2022-11-20 ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── package * version date lib source cachem 1.0.6 2021-08-19 [2] CRAN (R 3.6.2) callr 3.7.0 2021-04-20 [2] CRAN (R 3.6.2) class 7.3-16 2020-03-25 [2] CRAN (R 3.6.2) classInt 0.4-3 2020-04-07 [2] CRAN (R 3.6.2) cli 3.1.0 2021-10-27 [2] CRAN (R 3.6.2) crayon 1.3.4 2017-09-16 [2] CRAN (R 3.6.2) data.table 1.14.2 2021-09-27 [2] CRAN (R 3.6.2) DBI 1.1.1 2021-01-15 [2] CRAN (R 3.6.2) desc 1.4.0 2021-09-28 [2] CRAN (R 3.6.2) devtools 2.2.2 2020-02-17 [2] CRAN (R 3.6.2) dplyr * 1.0.7 2021-06-18 [2] CRAN (R 3.6.2) e1071 1.7-3 2019-11-26 [2] CRAN (R 3.6.2) ellipsis 0.3.2 2021-04-29 [2] CRAN (R 3.6.2) fansi 0.4.1 2020-01-08 [2] CRAN (R 3.6.2) fastmap 1.1.0 2021-01-25 [2] CRAN (R 3.6.2) fs 1.5.2 2021-12-08 [2] CRAN (R 3.6.2) generics 0.1.1 2021-10-25 [2] CRAN (R 3.6.2) glue 1.5.1 2021-11-30 [2] CRAN (R 3.6.2) GSODR * 3.1.6 2022-08-13 [1] CRAN (R 3.6.2) KernSmooth 2.23-16 2019-10-15 [2] CRAN (R 3.6.2) lifecycle 1.0.1 2021-09-24 [2] CRAN (R 3.6.2) magrittr 2.0.1 2020-11-17 [2] CRAN (R 3.6.2) memoise 2.0.1 2021-11-26 [2] CRAN (R 3.6.2) pillar 1.6.4 2021-10-18 [2] CRAN (R 3.6.2) pkgbuild 1.0.6 2019-10-09 [2] CRAN (R 3.6.2) pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 3.6.2) pkgload 1.2.4 2021-11-30 [2] CRAN (R 3.6.2) prettyunits 1.1.1 2020-01-24 [2] CRAN (R 3.6.2) processx 3.5.2 2021-04-30 [2] CRAN (R 3.6.2) ps 1.6.0 2021-02-28 [2] CRAN (R 3.6.2) purrr 0.3.4 2020-04-17 [2] CRAN (R 3.6.2) R6 2.4.1 2019-11-12 [2] CRAN (R 3.6.2) Rcpp 1.0.7 2021-07-07 [2] CRAN (R 3.6.2) remotes 2.1.1 2020-02-15 [2] CRAN (R 3.6.2) rlang 0.4.12 2021-10-18 [2] CRAN (R 3.6.2) rprojroot 2.0.2 2020-11-15 [2] CRAN (R 3.6.2) rstudioapi 0.13 2020-11-12 [2] CRAN (R 3.6.2) sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 3.6.2) sf 1.0-4 2021-11-14 [2] CRAN (R 3.6.2) stringi 1.4.6 2020-02-17 [2] CRAN (R 3.6.2) stringr * 1.4.0 2019-02-10 [2] CRAN (R 3.6.2) testthat 3.1.1 2021-12-03 [2] CRAN (R 3.6.2) tibble 3.1.6 2021-11-07 [2] CRAN (R 3.6.2) tidyr 1.1.4 2021-09-27 [2] CRAN (R 3.6.2) tidyselect 1.1.1 2021-04-30 [2] CRAN (R 3.6.2) units 0.6-6 2020-03-16 [2] CRAN (R 3.6.2) usethis 2.1.5 2021-12-09 [2] CRAN (R 3.6.2) utf8 1.1.4 2018-05-24 [2] CRAN (R 3.6.2) vctrs 0.3.8 2021-04-29 [2] CRAN (R 3.6.2) withr 2.4.3 2021-11-30 [2] CRAN (R 3.6.2) ```

To reproduce this problem:

wget https://www.ncei.noaa.gov/data/global-summary-of-the-day/access/2017/72057600174.csv
head -2 72057600174.csv
"STATION","DATE","LATITUDE","LONGITUDE","ELEVATION","NAME","TEMP","TEMP_ATTRIBUTES","DEWP","DEWP_ATTRIBUTES","SLP","SLP_ATTRIBUTES","STP","STP_ATTRIBUTES","VISIB","VISIB_ATTRIBUTES","WDSP","WDSP_ATTRIBUTES","MXSPD","GUST","MAX","MAX_ATTRIBUTES","MIN","MIN_ATTRIBUTES","PRCP","PRCP_ATTRIBUTES","SNDP","FRSHTT"
"72057600174","2017-01-01","38.533","-121.783","21.0","UNIVERSITY AIRPORT, CA US","  42.9","24","  39.2","24","9999.9"," 0","011.7","16","  9.8","24","  5.8","24"," 12.0"," 15.0","  51.8","*","  39.2","*"," 0.00","I","999.9","000000"
dt = GSODR::reformat_GSOD(file_list = '72057600174.csv')
head(dt)
          STNID NAME CTRY COUNTRY_NAME ISO2C ISO3C STATE LATITUDE LONGITUDE ELEVATION BEGIN END   YEARMODA YEAR MONTH DAY YDAY TEMP
1: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-01 2017     1   1    1  6.1
2: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-02 2017     1   2    2  6.7
3: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-03 2017     1   3    3  7.5
4: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-04 2017     1   4    4  9.8
5: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-05 2017     1   5    5  7.3
6: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-06 2017     1   6    6  3.7

I dug around and I think the problem is at the isd_history file called during reformat_GSOD, which does not contain the station id 72057600174 but contained another id at the same location 720576-99999.

I am a bit confused about why this might be happening. Any input would be greatly appreciated.

Best

adamhsparks commented 1 year ago

Hi, sorry about this issue. It is indeed unexpected behaviour. Unfortunately, right now I’m on vacation without a laptop to investigate. I’ll get back to this as soon as I’m able to next month sometime.

Is it the behaviour the same when using the get_gsod() for the same data set?

meixilin commented 1 year ago

No worries! I think I ended up rewriting the reformat_GSOD a little bit without querying the isd_history and that fixed my problems. I haven't tried the get_gsod yet.

Enjoy your holidays!

adamhsparks commented 1 year ago

Thank you for reporting this. It was indeed a bug that went deeper than I expected. I've fixed everything up in the devel branch now and will submit a new release to CRAN in 2023.

adamhsparks commented 1 year ago

Thanks, this has been fixed in the latest version available from CRAN now, v3.1.7