Closed rjbehnke closed 7 years ago
Thanks for your message. Please paste in your sessionInfo()
when you have rnoaa
loaded
And any example usage of isd()
when you get that warning
Hi,
Here you go. Thank you!
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.1.0 dplyr_0.5.0 plyr_1.8.4 rerddap_0.3.4 rnoaa_0.6.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.7 xml2_1.0.0 magrittr_1.5 rappdirs_0.3.1 munsell_0.4.3 colorspace_1.2-6 R6_2.1.3 httr_1.2.1 tools_3.3.1 grid_3.3.1
[11] data.table_1.9.6 gtable_0.2.0 DBI_0.5 assertthat_0.1 digest_0.6.10 tibble_1.2 gridExtra_2.2.1 ggplot2_2.1.0 tidyr_0.6.0 curl_1.2
[21] ncdf4_1.15 mime_0.5 stringi_1.1.1 scales_0.4.0 XML_3.98-1.4 jsonlite_1.0 lubridate_1.5.6 chron_2.3-47
Example CODE: (Note that the warning messages here come when the download failed, but I have seen it for successful downloads, as well).
[1] 1965
Error : download failed for
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1966/690070-93217-1966.gz
In addition: Warning messages:
1: Unknown column 'precipitation'
2: Unknown column 'precipitation'
3: Unknown column 'precipitation'
Ruben
for (yr in 1901:2016) {
try(
assign(paste("data",yr,sep=""),
isd(isd_history$USAF[stn],
isd_history$WBAN[stn], year = yr, path = "I:\\ISD",
overwrite = TRUE,cleanup = TRUE)$data)
)
print(yr)
}
thanks, that warning comes from tibble
, the output data.frame is special kind of data.frame, of class tbl_df
it's just a warning, but I've just added suppressWarnings
to the parsing code so that shouldn't show up anymore. reinstall devtools::install_github("ropensci/rnoaa")
and try again
Can you share the the code I need to reproduce the error above
Error : download failed for
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1966/690070-93217-1966.gz
that works for me, not sure why it doesn't for you, perhaps a path problem
Here is the zipped R code I'm using. I get 'download failed' errors a lot. Maybe my code is just bad. I don't know. I'm not a super experienced programmer.
I was also wondering if I can just use rnoaa to parse downloaded ISD .gz files. Is there a way to do this? I really appreciate your help.
Ruben
Thanks I'll take a look at your code and get back to you here
I was also wondering if I can just use rnoaa to parse downloaded ISD .gz files. Is there a way to do this? I really appreciate your help.
Not at the moment, but I can expose a function to do that, see #169
It seems like I can already parse the data just by pointing the path to the directory where the files are located, but a specific function to do this would be great. I am currently downloading all the files.
@rjbehnke you use a file isd-history.csv
i don't have access to that.
Here's the isd-history file (contains only North American stations). The isd_read() function works great!
One other thing I can think of is the option to include/not include bad data in the output. There are a lot of different flags in the ISD data, and missing data is represented by different values for each variable, so I don't know how much automation you want to include in a function. But, for people who just want some nice output, perhaps some automation is ok. I have written QC routines for hourly data from ISD and other networks, but I am refining these routines (they need it before I feel comfortable making them available).
thanks for the file.
There are a lot of different flags in the ISD data, and missing data is represented by different values for each variable, so I don't know how much automation you want to include in a function. But, for people who just want some nice output, perhaps some automation is ok.
Correct. The data is pretty messy. Do you have code already to clean them up?
Scott,
I do have code, but it is not ready for 'production'. I am starting to refine it (which it desperately needs), but since I am also trying to graduate, it's not a fast process. You are welcome to take a look at it, and work with me in making it better (much better) if you want. Just let me know, and I can send you my code (whether or not it is understandable might be a different story:). There are some major things I want to change.
This code was used for QC of all kinds of sources of data, ranging from ISD to RAWS to many local/regional mesonets. So, it is generalized, and meant for hourly, not daily, data. It is also focused on humidity (specifically, dew point), but it does do general checks on RH and temperature. I would like to write an R package that users who collect their own data or download data from sources that do not do their own QC can use to perform QC. This is a BIG, challenging project, though. I will say that right now, I am likely removing more good data than I care to admit. But, for my work, I'm more concerned about the influence of even a couple bad data values.
Ruben
From: Scott Chamberlain [notifications@github.com] Sent: Monday, September 12, 2016 2:51 PM To: ropensci/rnoaa Cc: Behnke, Ruben; Mention Subject: Re: [ropensci/rnoaa] unknown column: precipitation ISD (#168)
thanks for the file.
There are a lot of different flags in the ISD data, and missing data is represented by different values for each variable, so I don't know how much automation you want to include in a function. But, for people who just want some nice output, perhaps some automation is ok.
Correct. The data is pretty messy. Do you have code already to clean them up?
� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/rnoaa/issues/168#issuecomment-246488881, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU2__vJgO8AK7lzOwqPeMARyC6Dk5Bks5qpbtmgaJpZM4J5fBB.
one thing to note is that I recently https://github.com/ropensci/rnoaa/commit/201ad62d7a9cae970426b5f54a4873dc196760cc changed the output of isd()
to a tibble (data.frame) instead of a data.frame nested in a list
try it again after reinstalling devtools::install_github("ropensci/rnoaa")
here's a simpler version of your script, just focusing on making sure the file downloading/etc is working correctly. I think you shouldn't hit download fails anymore, though you might
library(dplyr)
library(rnoaa)
isd_history <- read.csv('~/Downloads/isd-history2.csv')
isd_history$CTRY <- as.character(isd_history$CTRY); isd_history$STATION.NAME <- as.character(isd_history$STATION.NAME)
isd_history <- subset(isd_history, isd_history$CTRY == 'US' | isd_history$CTRY == 'CA' | isd_history$CTRY == 'MX')
isd_history <- subset(isd_history, STATION.NAME != 'MOORED BUOY')
low <- which(isd_history$WBAN < 1000)
med <- which(isd_history$WBAN >= 1000 & isd_history$WBAN <= 9999)
isd_history$WBAN[low] <- paste('00',isd_history$WBAN[low],sep='')
isd_history$WBAN[med] <- paste('0',isd_history$WBAN[med],sep='')
isd_history$ID <- paste(isd_history$USAF,'-',isd_history$WBAN,sep='')
for (stn in 1:10) {
cat(stn, "\n")
begin <- as.numeric(substr(isd_history$BEGIN[stn],1,4))
end <- as.numeric(substr(isd_history$END[stn],1,4))
for (yr in begin:end) {
cat(" working on:", yr, "\n")
res <- tryCatch(
isd(isd_history$USAF[stn], isd_history$WBAN[stn], year = yr),
error = function(e) e
)
if (inherits(res, "error")) {
cat("failed on ", isd_history$USAF[stn], isd_history$WBAN[stn], yr, "\n")
}
}
}
Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch()
and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing
## 621370-99999
failed on 621370 99999 2006
failed on 621370 99999 2007
failed on 621370 99999 2008
failed on 621370 99999 2009
failed on 621370 99999 2010
failed on 621370 99999 2011
failed on 621370 99999 2012
failed on 621370 99999 2013
## 690020-93218
failed on 690020 93218 1972
failed on 690020 93218 1973
failed on 690020 93218 1974
failed on 690020 93218 1975
failed on 690020 93218 1976
failed on 690020 93218 1977
failed on 690020 93218 1978
failed on 690020 93218 1979
failed on 690020 93218 1980
failed on 690020 93218 1981
failed on 690020 93218 1982
failed on 690020 93218 1983
failed on 690020 93218 1984
failed on 690020 93218 1985
failed on 690020 93218 1986
failed on 690020 93218 1987
failed on 690020 93218 1988
## 690070-93217
failed on 690070 93217 1971
failed on 690070 93217 1972
failed on 690070 93217 1973
failed on 690070 93217 1974
failed on 690070 93217 1975
failed on 690070 93217 1976
failed on 690070 93217 1977
failed on 690070 93217 1978
failed on 690070 93217 1979
failed on 690070 93217 1980
failed on 690070 93217 1981
failed on 690070 93217 1982
failed on 690070 93217 1983
failed on 690070 93217 1984
failed on 690070 93217 1985
failed on 690070 93217 1986
failed on 690070 93217 1987
failed on 690070 93217 1988
failed on 690070 93217 1989
failed on 690070 93217 1990
## 690110-99999
failed on 690110 99999 1947
failed on 690110 99999 1948
Thanks Scott. It just seemed strange that there was so many years missing from the middle of a time series from a station. I guess its just the way ISD is.
From: Scott Chamberlain [notifications@github.com] Sent: Tuesday, September 13, 2016 2:08 PM To: ropensci/rnoaa Cc: Behnke, Ruben; Mention Subject: Re: [ropensci/rnoaa] unknown column: precipitation ISD (#168)
Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch() and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing
failed on 621370 99999 2006 failed on 621370 99999 2007 failed on 621370 99999 2008 failed on 621370 99999 2009 failed on 621370 99999 2010 failed on 621370 99999 2011 failed on 621370 99999 2012 failed on 621370 99999 2013
failed on 690020 93218 1972 failed on 690020 93218 1973 failed on 690020 93218 1974 failed on 690020 93218 1975 failed on 690020 93218 1976 failed on 690020 93218 1977 failed on 690020 93218 1978 failed on 690020 93218 1979 failed on 690020 93218 1980 failed on 690020 93218 1981 failed on 690020 93218 1982 failed on 690020 93218 1983 failed on 690020 93218 1984 failed on 690020 93218 1985 failed on 690020 93218 1986 failed on 690020 93218 1987 failed on 690020 93218 1988
failed on 690070 93217 1971 failed on 690070 93217 1972 failed on 690070 93217 1973 failed on 690070 93217 1974 failed on 690070 93217 1975 failed on 690070 93217 1976 failed on 690070 93217 1977 failed on 690070 93217 1978 failed on 690070 93217 1979 failed on 690070 93217 1980 failed on 690070 93217 1981 failed on 690070 93217 1982 failed on 690070 93217 1983 failed on 690070 93217 1984 failed on 690070 93217 1985 failed on 690070 93217 1986 failed on 690070 93217 1987 failed on 690070 93217 1988 failed on 690070 93217 1989 failed on 690070 93217 1990
failed on 690110 99999 1947 failed on 690110 99999 1948
� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/rnoaa/issues/168#issuecomment-246807434, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU28hMKNMokl1oGIWRgnmHm2UkgTEPks5qpwK8gaJpZM4J5fBB.
Right, I guess that's the way it is
Scott,
The read_isd function works very good, but there are some errors that arise when trying to read the csv files written out after using the isd_read function. I assume these are probably associated with errors in the NCDC files. Here is a list of them. I would suggest that functionality be included with the isd_read function to look for these errors and either correct them or remove the rows they occur on (I have not seen any valid data on rows these errors occur on).
1) The columns 'total_chars','usaf_station','wban_station", "date", and 'time' occasionally have bad values (or no data whatsoever) that look like "+0230" or "-0700", etc.
2) Error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns than column names (ex. "697774-99999")
3)
Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed (ex. "467425-99999")
4) In addition: Warning message: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string
5)
Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format
Ruben Behnke
From: Behnke, Ruben Sent: Tuesday, September 13, 2016 5:47 PM To: ropensci/rnoaa; ropensci/rnoaa Cc: Mention Subject: RE: [ropensci/rnoaa] unknown column: precipitation ISD (#168)
Thanks Scott. It just seemed strange that there was so many years missing from the middle of a time series from a station. I guess its just the way ISD is.
From: Scott Chamberlain [notifications@github.com] Sent: Tuesday, September 13, 2016 2:08 PM To: ropensci/rnoaa Cc: Behnke, Ruben; Mention Subject: Re: [ropensci/rnoaa] unknown column: precipitation ISD (#168)
Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch() and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing
failed on 621370 99999 2006 failed on 621370 99999 2007 failed on 621370 99999 2008 failed on 621370 99999 2009 failed on 621370 99999 2010 failed on 621370 99999 2011 failed on 621370 99999 2012 failed on 621370 99999 2013
failed on 690020 93218 1972 failed on 690020 93218 1973 failed on 690020 93218 1974 failed on 690020 93218 1975 failed on 690020 93218 1976 failed on 690020 93218 1977 failed on 690020 93218 1978 failed on 690020 93218 1979 failed on 690020 93218 1980 failed on 690020 93218 1981 failed on 690020 93218 1982 failed on 690020 93218 1983 failed on 690020 93218 1984 failed on 690020 93218 1985 failed on 690020 93218 1986 failed on 690020 93218 1987 failed on 690020 93218 1988
failed on 690070 93217 1971 failed on 690070 93217 1972 failed on 690070 93217 1973 failed on 690070 93217 1974 failed on 690070 93217 1975 failed on 690070 93217 1976 failed on 690070 93217 1977 failed on 690070 93217 1978 failed on 690070 93217 1979 failed on 690070 93217 1980 failed on 690070 93217 1981 failed on 690070 93217 1982 failed on 690070 93217 1983 failed on 690070 93217 1984 failed on 690070 93217 1985 failed on 690070 93217 1986 failed on 690070 93217 1987 failed on 690070 93217 1988 failed on 690070 93217 1989 failed on 690070 93217 1990
failed on 690110 99999 1947 failed on 690110 99999 1948
� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/rnoaa/issues/168#issuecomment-246807434, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU28hMKNMokl1oGIWRgnmHm2UkgTEPks5qpwK8gaJpZM4J5fBB.
thanks @rjbehnke for this info. really helpful. It would be even more helpful if you could tell me which dataset requests lead to those errors, so I can quickly get examples that I can play with to sort these errors out.
Scott,
Here's a document with info on the errors. I attached the script I'm using. Please let me know if you need something else.
Ruben
From: Behnke, Ruben Sent: Saturday, October 01, 2016 2:04 PM To: ropensci/rnoaa; ropensci/rnoaa Cc: Mention Subject: read_isd errors
Scott,
The read_isd function works very good, but there are some errors that arise when trying to read the csv files written out after using the isd_read function. I assume these are probably associated with errors in the NCDC files. Here is a list of them. I would suggest that functionality be included with the isd_read function to look for these errors and either correct them or remove the rows they occur on (I have not seen any valid data on rows these errors occur on).
1) The columns 'total_chars','usaf_station','wban_station", "date", and 'time' occasionally have bad values (or no data whatsoever) that look like "+0230" or "-0700", etc.
2) Error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns than column names (ex. "697774-99999")
3)
Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed (ex. "467425-99999")
4) In addition: Warning message: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string
5)
Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format
Ruben Behnke
From: Behnke, Ruben Sent: Tuesday, September 13, 2016 5:47 PM To: ropensci/rnoaa; ropensci/rnoaa Cc: Mention Subject: RE: [ropensci/rnoaa] unknown column: precipitation ISD (#168)
Thanks Scott. It just seemed strange that there was so many years missing from the middle of a time series from a station. I guess its just the way ISD is.
From: Scott Chamberlain [notifications@github.com] Sent: Tuesday, September 13, 2016 2:08 PM To: ropensci/rnoaa Cc: Behnke, Ruben; Mention Subject: Re: [ropensci/rnoaa] unknown column: precipitation ISD (#168)
Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch() and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing
failed on 621370 99999 2006 failed on 621370 99999 2007 failed on 621370 99999 2008 failed on 621370 99999 2009 failed on 621370 99999 2010 failed on 621370 99999 2011 failed on 621370 99999 2012 failed on 621370 99999 2013
failed on 690020 93218 1972 failed on 690020 93218 1973 failed on 690020 93218 1974 failed on 690020 93218 1975 failed on 690020 93218 1976 failed on 690020 93218 1977 failed on 690020 93218 1978 failed on 690020 93218 1979 failed on 690020 93218 1980 failed on 690020 93218 1981 failed on 690020 93218 1982 failed on 690020 93218 1983 failed on 690020 93218 1984 failed on 690020 93218 1985 failed on 690020 93218 1986 failed on 690020 93218 1987 failed on 690020 93218 1988
failed on 690070 93217 1971 failed on 690070 93217 1972 failed on 690070 93217 1973 failed on 690070 93217 1974 failed on 690070 93217 1975 failed on 690070 93217 1976 failed on 690070 93217 1977 failed on 690070 93217 1978 failed on 690070 93217 1979 failed on 690070 93217 1980 failed on 690070 93217 1981 failed on 690070 93217 1982 failed on 690070 93217 1983 failed on 690070 93217 1984 failed on 690070 93217 1985 failed on 690070 93217 1986 failed on 690070 93217 1987 failed on 690070 93217 1988 failed on 690070 93217 1989 failed on 690070 93217 1990
failed on 690110 99999 1947 failed on 690110 99999 1948
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/rnoaa/issues/168#issuecomment-246807434, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU28hMKNMokl1oGIWRgnmHm2UkgTEPks5qpwK8gaJpZM4J5fBB.
@rjbehnke didn't get the attachment. I think you have to use the github web interface maybe, or email it to me.
see file in #169
closing for now, let me know if there's anything we didn't sort out @rjbehnke
Hi,
When I use the rnoaa package to get ISD data, I often get the warning message "unknown column 'precipitation' ". Is there a way to fix this? I am using this package to download the ISD data set for North American stations. I downloaded the isd station history, and am going through each station at a time.
Thank you, Ruben Behnke