ropensci / rnoaa

R interface to many NOAA data APIs
https://docs.ropensci.org/rnoaa
Other
328 stars 85 forks source link

lcd: how to easily obtain entire station ids? #380

Open sckott opened 3 years ago

sckott commented 3 years ago

Station ids given to the lcd() function are those ids for the files we access from NOAA, see e.g., files for year 2020 https://www.ncei.noaa.gov/data/local-climatological-data/access/2020/ - Each station id is a combination of the USAF (Air Force station ID) code followed by the WBAN (Weather-Bureau-Army-Navy number) code. ncdc_stations() only returns the wban part of the code.

how do we make it possible to search for and get entire station codes that lcd() requires?

ecoflo commented 3 years ago

Have you considered : stations <- ghcnd_stations(refresh = TRUE) us.stations <- stations[grep("US", substr(stations$id, start = 1, stop = 2)),]

sckott commented 3 years ago

@ecoflo Can you show an example of how to get an ID lcd() wants from your example code?

lpiep commented 2 years ago

I've been looking at this issue because I'd like to use this package to download all US LCD data on a schedule. The LCD documentation indicates that its data is pulled from the Integrated Surface Database (ISD). I looked at the historical ISD data set (ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-history.csv), and it contains almost all of the stations in the LCD that aren't missing location data (and includes both the USAF and WBAN ids). Only two stations with complete information in the LCD didn't appear in the ISD data set (one in Massachusetts, one in Florida).

However, as of today, there are 5800 station in the ISD data that don't appear in the LCD data. I haven't been able to find a good way of figuring out which stations listed in ISD won't appear in LCD, so I'm not sure if this data set is what we want for listing all available LCD stations (though maybe it's still helpful for adding the USAF id into ncdc_stations's output).

I think we may have to build an inventory by actually reading through all available LCD files. That would take some time to run, but the data set could be built when the package is updated and included as an internal data set rather than being pulled from NOAA by the user.

Let me know if I can help!