Figure out how to test if a website exists or not

andrew-edwards commented 8 months ago

Annoyingly, some of the websites for climatic indices contain the month and year in them, making it hard to automatically grab the latest calculations as the pathname or filename keeps changing.

For example: https://climatedataguide.ucar.edu/sites/default/files/2023-04/npindex_monthly.txt

A while ago I tried things like

> sitename <- "https://climatedataguide.ucar.edu/sites/default/files/2023-04/npindex_monthly.txt" 
> RCurl::url.exists(sitename)  
[1] FALSE

So that function returns FALSE even though the site exists. Then I gave up and wrote a bit of a clunky manual way of doing it (checking each month going back from the current one to the last one saved; in future we plan to update these every two months, so not too onerous, but would still be nice to automate it). See here

https://github.com/pbs-assess/pacea/blob/f0e696e49fd6295637af4a153a52b5aec064068c/data-raw/coastwide-indices/coastwide-indices.R#L94

So.... anyone know of other functions to detect if a website exists or not? Then I could automate the looping back in time through months.

cgrandin commented 8 months ago

Rcurl::url.exists() seems to work (in Linux) But a better way might be using package httr like this:

x <-httr::GET("https://climatedataguide.ucar.edu/sites/default/files/2023-04/npindex_monthly.txt")
x$status_code

Look for status code 403, 404 etc or 200 means it exists You could even grab the data from this output object, doing it all in one call.

andrew-edwards commented 8 months ago

Thanks Chris - that should work.

pbs-assess / pacea

Figure out how to test if a website exists or not #45