ropensci / rnoaa

R interface to many NOAA data APIs
https://docs.ropensci.org/rnoaa
Other
330 stars 84 forks source link

Date format differs by Station #205

Closed mdwhitby closed 5 years ago

mdwhitby commented 7 years ago

Hello,

Great package, almost everything I need!

I have found a problem that appears to have happened with a recent update I did (not sure if it was tidyverse or Rnoaa).

When I download data from different ISD stations I get 2 different date formats. One is formated as a character, but the other comes in as a date format. An example is below. I used lapply to download stations by year. data[[2]] date is a chr, while data[[1]] it is a date.

I think this might have something to do with reading or converting the data to a tibble where it is autoformating.

data[[2]][1:10,1:5]

A tibble: 10 × 5

total_chars usaf_station wban_station date time

1 0126 720308 04992 20160101 0015 2 0126 720308 04992 20160101 0035 3 0185 720308 04992 20160101 0055 4 0185 720308 04992 20160101 0115 5 0126 720308 04992 20160101 0135 6 0185 720308 04992 20160101 0155 7 0166 720308 04992 20160101 0215 8 0185 720308 04992 20160101 0235 9 0126 720308 04992 20160101 0255 10 0137 720308 04992 20160101 0315 > data[[1]][1:10,1:5] # A tibble: 10 × 5 total_chars usaf_station wban_station date time 1 126 720308 04992 2015-01-01 0015 2 139 720308 04992 2015-01-01 0035 3 126 720308 04992 2015-01-01 0055 4 126 720308 04992 2015-01-01 0115 5 126 720308 04992 2015-01-01 0135 6 126 720308 04992 2015-01-01 0155 7 137 720308 04992 2015-01-01 0215 8 126 720308 04992 2015-01-01 0235 9 137 720308 04992 2015-01-01 0255 10 126 720308 04992 2015-01-01 0315 > sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] lubridate_1.6.0 rnoaa_0.6.5 dplyr_0.5.0 purrr_0.2.2 readr_1.0.0 [6] tidyr_0.6.0 tibble_1.2 ggplot2_2.1.0 tidyverse_1.0.0 RevoUtilsMath_10.0.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.7 xml2_1.0.0 magrittr_1.5 rappdirs_0.3.1 munsell_0.4.3 colorspace_1.2-7 R6_2.2.0 [8] stringr_1.1.0 httr_1.2.1 plyr_1.8.4 tools_3.3.2 grid_3.3.2 gtable_0.2.0 DBI_0.5-1 [15] lazyeval_0.2.0 assertthat_0.1 gridExtra_2.2.1 curl_2.2 mime_0.5 stringi_1.1.2 RevoUtils_10.0.2 [22] scales_0.4.0 XML_3.98-1.4 jsonlite_1.1 foreign_0.8-67
sckott commented 7 years ago

thank @mdwhitby - please share what function you are referring to and example code that gives you the two different results

mdwhitby commented 7 years ago

I download the data with the isd() function. It actually appears to be the same station, but different years (720308-04992 in 2015 and 2016). The relevant output should be in the printed header I included.

sckott commented 7 years ago

First thing is: update rnoaa to the latest version on CRAN v0.6.6 https://cran.rstudio.com/web/packages/rnoaa/ then try again and tell me what you get.

mdwhitby commented 7 years ago

updating did not seem to help. Looking through the source I speculate that it is actually a problem with the isdparser::isd_parse() function. on line 59 you "use tibble::as_data_frame(df)". I think tibble is automatically formating columns, and for some reason the date column is changed only occasionaly. is there a way to force all columns to be character? Or does it have to be a tibble? Personally, since I am pulling 3 years of data from 30+ stations i am not opposed to having them as data.tables.

This is the first 2 list elements returned from rnoaa::isd() in an apply() usaf_station [1] "720308" wban_station [1] "04992"

class(data1[[1]]$date) #2015 data from station [1] "Date" class(data1[[2]]$date) #2016 data from station [1] "character"

Michael Whitby michael.whitby@gmail.com 609-923-0973

On Mon, Mar 6, 2017 at 3:38 PM, Scott Chamberlain notifications@github.com wrote:

First thing is: update rnoaa to the latest version on CRAN v0.6.6 https://cran.rstudio.com/web/packages/rnoaa/ then try again and tell me what you get.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

sckott commented 7 years ago

@mdwhitby thanks for the update.

are you sure the above report is using the latest rnoaa? i only ask because i'm using the exact same version of rnoaa on CRAN and I don't get the problem you're reporting - i have tibble v1.2 as well so that's not different

I haven't seen all your code - it's possible you're doing something that's affecting the output and/or perhaps you have options that load in your R session that are coming into play here

mdwhitby commented 7 years ago

I think I finally tracked it down.

When I originally coded I used "regular" R. This used rnoaa 0.6.6 and cached columns in character formats. I then switched to the MRAN version (for speed in doing a lot of data), it uses rnoaa 0.6.5 and chached any new files with the Date format. By deleteing the cache and not switching between versions the columns remain constaint.

You may consider having a cache folder for each version.

Michael Whitby michael.whitby@gmail.com 609-923-0973

On Tue, Mar 7, 2017 at 12:56 PM, Scott Chamberlain <notifications@github.com

wrote:

@mdwhitby https://github.com/mdwhitby thanks for the update.

are you sure the above report is using the latest rnoaa? i only ask because i'm using the exact same version of rnoaa on CRAN and I don't get the problem you're reporting - i have tibble v1.2 as well so that's not different

I haven't seen all your code - it's possible you're doing something that's affecting the output and/or perhaps you have options that load in your R session that are coming into play here

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/rnoaa/issues/205#issuecomment-284820272, or mute the thread https://github.com/notifications/unsubscribe-auth/AXdHZvAgTCoLMVy3nT_2OgYipZAT58iPks5rjahLgaJpZM4MUeWU .

sckott commented 7 years ago

glad you figured it out.

You may consider having a cache folder for each version.

for each version of what?

mdwhitby commented 7 years ago

any new update of rnoaa - say /rnoaa/v066 and when updated /rnoaa/v067. That would force it to use the cache from that version or rnoaa and make sure the cleaned data is in the same format.

Michael Whitby michael.whitby@gmail.com 609-923-0973

On Tue, Mar 7, 2017 at 2:36 PM, Scott Chamberlain notifications@github.com wrote:

glad you figured it out.

You may consider having a cache folder for each version.

for each version of what?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/rnoaa/issues/205#issuecomment-284850722, or mute the thread https://github.com/notifications/unsubscribe-auth/AXdHZlxtsPNx04qDWYGhthMK2cpiUKavks5rjb_KgaJpZM4MUeWU .

sckott commented 7 years ago

Hmm, maybe - i might rather just make sure the docs are clear on this - and maybe i could have a message to the user about checking that their cache