pbs-assess / pacea

An R package to house Pacific Region ecosystem data to help facilitate an ecosystem approach to fisheries.
Other
14 stars 0 forks source link

Figure out how to test if a website exists or not #45

Closed andrew-edwards closed 8 months ago

andrew-edwards commented 8 months ago

Annoyingly, some of the websites for climatic indices contain the month and year in them, making it hard to automatically grab the latest calculations as the pathname or filename keeps changing.

For example: https://climatedataguide.ucar.edu/sites/default/files/2023-04/npindex_monthly.txt

A while ago I tried things like

> sitename <- "https://climatedataguide.ucar.edu/sites/default/files/2023-04/npindex_monthly.txt" 
> RCurl::url.exists(sitename)  
[1] FALSE 

So that function returns FALSE even though the site exists. Then I gave up and wrote a bit of a clunky manual way of doing it (checking each month going back from the current one to the last one saved; in future we plan to update these every two months, so not too onerous, but would still be nice to automate it). See here

https://github.com/pbs-assess/pacea/blob/f0e696e49fd6295637af4a153a52b5aec064068c/data-raw/coastwide-indices/coastwide-indices.R#L94

So.... anyone know of other functions to detect if a website exists or not? Then I could automate the looping back in time through months.

cgrandin commented 8 months ago

Rcurl::url.exists() seems to work (in Linux) But a better way might be using package httr like this:

x <-httr::GET("https://climatedataguide.ucar.edu/sites/default/files/2023-04/npindex_monthly.txt")
x$status_code

Look for status code 403, 404 etc or 200 means it exists You could even grab the data from this output object, doing it all in one call.

image

andrew-edwards commented 8 months ago

Thanks Chris - that should work.