ropensci / rnoaa

R interface to many NOAA data APIs
https://docs.ropensci.org/rnoaa
Other
330 stars 84 forks source link

buoy data clarification #165

Closed pssguy closed 7 years ago

pssguy commented 8 years ago

Am I right in understanding there is no straightforward way of getting lat/lon data on all buoys, what info they provide and how long they have been in operation? If possible, this would be a useful portal to acquiring data. Also is it all just historical data or is there access to the most current results via the package tx

sckott commented 8 years ago

there is no straightforward way of getting lat/lon data on all buoys,

don't think so, can see if that exists

Also is it all just historical data or is there access to the most current results via the package tx

don't know, what do you mean by current?

pssguy commented 8 years ago

Actually I now see there is a station list

By current I meant hourly data for past 45 days e.g. I tried extracting one of these text files with readr and tidyr but haven't got it into a good data.frame as of yet

sckott commented 8 years ago

Right, just saw that station list as well. Not the greatest thing in the world, though typical for NOAA. Maybe I can scrape that list, and lat/long from each station's page.

we get the data from http://dods.ndbc.noaa.gov/thredds/catalog/data/ - which Im not sure if it includes real time data or not

sckott commented 8 years ago

this seems to work, now to add a function

url_base <- 'http://www.ndbc.noaa.gov'

# get all stations
res <- GET(file.path(url_base, 'to_station.shtml'))
html <- read_html(utcf8(res))
sta_urls <- file.path(
  url_base, 
  xml_attr(
    xml_find_all(
      html, 
      "//a[contains(@href,'station_page.php?station')]"), 
    "href"
  )
)

# get individual station data
res <- lapply(sta_urls, function(w) {
  out <- GET(w)
  html <- read_html(utcf8(out))
  dc <- sapply(xml_find_all(html, "//meta[@name]"), function(z) {
    as.list(setNames(xml_attr(z, "content"), xml_attr(z, "name")))
  })
  c(
    station = str_extract_(w, "[0-9]+$"),
    lat = {
      val <- str_extract_(dc$DC.description, "[0-9]+\\.[0-9]+[NS]")
      num <- as.numeric(str_extract_(val, "[0-9]+\\.[0-9]+"))
      if (length(num) == 0) {
        NA
      } else {
        if (grepl("S", val)) num * -1 else num
      }
    },
    lon = {
      val <- str_extract_(dc$DC.description, "[0-9]+\\.[0-9]+[EW]") 
      num <- as.numeric(str_extract_(val, "[0-9]+\\.[0-9]+"))
      if (length(num) == 0) {
        NA
      } else {
        if (grepl("W", val)) num * -1 else num
      }
    },
    dc
  )
})

#
dat <- bind_rows(lapply(res, as_data_frame))

library(leaflet)
leaflet(data = na.omit(dat)) %>% 
  leaflet::addTiles() %>% 
  leaflet::addCircles(~lon, ~lat, opacity = 0.5)
screen shot 2016-09-11 at 11 26 27 am
sckott commented 8 years ago

@pssguy started a scraper fxn, looks like it still needs some work. e.g,. to get buoy data we need not just station id but what dataset its from, and we don't have that yet, will come back to this

pssguy commented 8 years ago

@sckott Thanks for your work to date and anything forthcoming. Will take another look when you have completed scraper function

sckott commented 8 years ago

forgot to say its included now, see buoy_stations() - https://github.com/ropensci/rnoaa/blob/master/R/buoy.R#L39-L44

pssguy commented 8 years ago

OK just tried out. Is there a unique id? e.g. there are 69 stations '3'

sckott commented 8 years ago

yes, the code needs to be cleaned up, or it may be that we aren't grabbing the right station id, not sure yet, any feedback is good

sckott commented 7 years ago

this is done