ropensci / rrricanes

Web scraper for Atlantic and east Pacific hurricanes and tropical storms
https://docs.ropensci.org/rrricanes
Other
21 stars 12 forks source link

Modify retrieval of forecast/advisory products #115

Closed timtrice closed 5 years ago

timtrice commented 6 years ago

As it stands now, the flow of get_fstadv or the equivalent get_storm_data(products = "fstadv") is:

  1. Get a character vector (length 1 or more) of links to the storm's archive page to get_storm_data
  2. The vector is split into groups of 4 and each group is retrieved. At this point you have all links from each requested storm's archive page.
  3. Forecast/Advisory links are extracted and sent to extract_product_contents where the text for each product is extracted.
    • In this example, the text is parsed in fstadv(), loaded into a dataframe and returned.
  4. A list of dataframes for each storm is returned.

With this, any cyclone in the AL or EP basin since 1998 has a fstadv product.

These products do exist on the FTP server. However, they are incomplete for the same time period and both basins.

Most recent cyclone fstadv products can be found here: ftp://ftp.nhc.noaa.gov/atcf/mar/

Archived fstadv products can be found here: ftp://ftp.nhc.noaa.gov/atcf/archive/MESSAGES/

However, not all years have these products (and can possibly be assumed not all cyclone's products will exist here, either).

timtrice commented 6 years ago

More fstadv products may be located in another subdirectory of the FTP server. In the archives (ftp://ftp.nhc.noaa.gov/atcf/archive/) are yearly subdirectories. In 1998, for example, are dat.gz files for each cyclone. It appears,

Then, there are two subdirectories:

It is messages that may contain what I want. For example, there is a al011998_msg.zip. Inside,

So, may be able to use the FTP server to get all data without going to the front-end.

Additionally, it seems the archives for these text products may go all the way back to 1991 giving another 7 years of data

Computer forecast model data may exists for even earlier cyclones.

timtrice commented 6 years ago

Allow either a key or a year (extract the year from key)

Given the year, determine what link to access;

If not current year, ftp://ftp.nhc.noaa.gov/atcf/archive/MESSAGES/2017/

Else, ftp://ftp.nhc.noaa.gov/atcf/

After this point, the functionality should remain the same.

Example of retrieving FTP list of URLs

url <- "ftp://ftp.nhc.noaa.gov/atcf/archive/MESSAGES/2017/mar/"
hdl <- curl::new_handle(dirlistonly = TRUE)
con <- curl::curl(url, "r", hdl)
tbl <- read.table(con, stringsAsFactors = FALSE, fill = TRUE)
close(con)
timtrice commented 5 years ago

Duplicate more or less of #113