ropensci / tidyhydat

An R package to import Water Survey of Canada hydrometric data and make it tidy
https://docs.ropensci.org/tidyhydat
Apache License 2.0
70 stars 19 forks source link

Consider adding a link to the WSC website to download HYDAT database manually #101

Closed rywhale closed 6 years ago

rywhale commented 6 years ago

For those of us behind corporate proxies, the generic download_hydat() fails unless you're knowledgeable enough to set up your system proxy information. It could be argued that this is outside the scope of tidyhydat but I mentioning/linking the source on the WSC probably doesn't hurt.

Typical proxy error below:

download_hydat()
Downloading HYDAT will take ~10 minutes.
This will remove any older versions of HYDAT
Is that okay?
--------------------------------------------------------------------------------

1: Yes
2: No

Selection: 1
* Downloading HYDAT.sqlite3 to C:\Users\whaleyry\AppData\Local\tidyhydat\tidyhydat
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Send failure: Connection was aborted
boshek commented 6 years ago

Interesting. As a quick fix, one can certain catch the curl error and output the link.

@rywhale Since I am unable to replicate this, would you be able to write a tryCatch call that catches this error. Something like this:

tryCatch(download_hydat(),
         error = function(e){
           if(grepl("Timeout was reached:", e$message)) 
              message("info message about downloading HYDAT")
         })

I do wonder if there is a way to detect that the corporate proxy will cause this to fail before it tries it to actually download.

rywhale commented 6 years ago

Yep, that actually work as is:

tryCatch(download_hydat(),
         error = function(e){
           if(grepl("Timeout was reached:", e$message)) 
              message("info message about downloading HYDAT")
         })
Downloading HYDAT will take ~10 minutes.
This will remove any older versions of HYDAT
Is that okay?
--------------------------------------------------------------------------------

1: Yes
2: No

Selection: 1
* Downloading HYDAT.sqlite3 to C:\Users\whaleyry\AppData\Local\tidyhydat\tidyhydat
info message about downloading HYDAT

I would suggest sticking the tryCatch right in the download_hydat function in your first httr::GET call.

e.g.

> download_that_hydat <- function(){
+   base_url <-
+     "http://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/"
+   tryCatch(httr::GET(base_url),
+               error = function(e){
+                 if(grepl("Timeout was reached:", e$message)) 
+                   stop("Could not connect to HYDAT source. Check your connection settings.", call. = FALSE)
+               })
+     
+   x <- httr::GET(base_url)
+ 
+ }
> download_that_hydat()
 Show Traceback

 Rerun with Debug
 Error: Could not connect to HYDAT source. Check your connection settings. 

I can submit this as a pull request if you'd like.

boshek commented 6 years ago

I can submit this as a pull request if you'd like.

Yes please! And I agree that is the right place to put it. To facilitate unit testing this thought could you actually wrap the tryCatch call in a function and place that function somewhere here:

https://github.com/ropensci/tidyhydat/blob/8da3f3562f0a6ea9d6ae6ce1ec87ac3917742690/R/utils.R#L77

boshek commented 6 years ago

Contributed by @coatless.

This should temporarily create a proxy where download_hydat() fails with the Timeout error:

httr::set_config(httr::use_proxy(url = "XXX", port = 1234), override = TRUE)
resp <- function_to_catch_curl_error(download_hydat())
httr::reset_config()

This framework could be used to write a unit test.

rywhale commented 6 years ago

Should have time for this tomorrow evening.

rywhale commented 6 years ago

I've written the function (network_check), added it to the bottom of utils.R.

https://github.com/rywhale/tidyhydat/blob/7c557f9aae1c9b22a9d2d48b3f0f27d448695f5b/R/utils.R#L183

and included it in the download_hydat function

https://github.com/rywhale/tidyhydat/blob/7c557f9aae1c9b22a9d2d48b3f0f27d448695f5b/R/download.R#L70

I'm not quite sure how to structure the test, it's not something I've done before.. More than willing to learn but a quick google search led me to some pretty opaque guides. Assume you'd like to get the test in there before I submit a pull request.

boshek commented 6 years ago

Nice work @rywhale

Something like this for the test might work:

test_that("downloading hydat fails behind a proxy server with informative error message",{
  httr::set_config(httr::use_proxy(url = "XXX", port = 1234), override = TRUE)
  expect_failure(network_check(download_hydat()), message = "Could not connect to HYDAT source. Check your connection settings.")
  httr::reset_config()
})

But shouldn't we make the message point to where you can download HYDAT directly? So sort of like this:

network_check <- function(url){
  tryCatch(httr::GET(base_url),
           error = function(e){
             if(grepl("Timeout was reached:", e$message))
               stop(paste0("Could not connect to HYDAT source. 
               Try downloading from this url: [hydat link] and 
               unzipping the saved file to this directory:", hy_dir()), 
                    call. = FALSE
               )}
  )}
rywhale commented 6 years ago

@boshek Very good point, added the link into the error message.

Just FYI, that framework for the proxy test doesn't seem to work, it results in a different error.

httr::set_config(httr::use_proxy(url = "XXX", port = 1234), override = TRUE)
httr::GET('http://google.com')
Error in curl::curl_fetch_memory(url, handle = handle) : Could not resolve proxy: XXX

but if I feed in a random web address as the proxy

httr::set_config(httr::use_proxy(url = "http://google.com", port = 1234), override = TRUE)
httr::GET('http://ropensci.org')
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Connection timed out after 10000 milliseconds

Anyhow, I've written the test by slightly altering what you gave me and it seems to be working fine. I'll give it a trial run today behind the proxy at work and submit a pull request tonight if everything's kosher.

rywhale commented 6 years ago

Confirmed working

> tidyhydat::download_hydat()
Downloading HYDAT will take ~10 minutes.
This will remove any older versions of HYDAT
Is that okay?
--------------------------------------------------------------------------------

1: Yes
2: No

Selection: 1
* Downloading HYDAT.sqlite3 to C:\Users\whaleyry\AppData\Local\tidyhydat\tidyhydat
Error: Could not connect to HYDAT source. Check your connection settings.
                           Try downloading HYDAT_sqlite3 from this url: 
                           [http://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/]
                           and unzipping the saved file to this directory:C:\Users\whaleyry\AppData\Local\tidyhydat\tidyhydat
rywhale commented 6 years ago

No idea why the travis job failed... Absolutely unfamiliar with travis.

https://travis-ci.org/ropensci/tidyhydat/jobs/395447840

coatless commented 6 years ago

@rywhale I was hoping you folks would modify it to a web address. 😉

RE Travis: Looks like a PPA is down. Not your fault or the codes. Try again later.

boshek commented 6 years ago

@coatless An actual url? So picky :wink: :man_facepalming: