ropensci / weathercan

R package for downloading weather data from Environment and Climate Change Canada
https://docs.ropensci.org/weathercan
GNU General Public License v3.0
102 stars 29 forks source link

HTTP 401 #100

Closed tspeidel-ey closed 4 years ago

tspeidel-ey commented 4 years ago

REPREX

kam <- weather_dl(station_ids = 51423, start = "2016-01-01", end = "2016-02-15")
Error: Problem with `mutate()` input `html`.
x Unauthorized (HTTP 401).
i Input `html` is `purrr::map(...)`.
Run `rlang::last_error()` to see where the error occurred.

R> rlang::last_error()
<error/dplyr_error>
Problem with `mutate()` input `html`.
x Unauthorized (HTTP 401).
i Input `html` is `purrr::map(...)`.
Backtrace:
  1. weathercan::weather_dl(...)
  2. weathercan:::weather_single(date_range, s, interval, encoding)
  3. dplyr::tibble(date_range = date_range)
 10. dplyr::mutate(...)
 12. dplyr:::mutate_cols(.data, ...)
Run `rlang::last_trace()` to see the full context.

Environment

R> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] plotly_4.9.2.1   kableExtra_1.1.0 weathercan_0.3.4 dplyr_1.0.0      rmdformats_0.3.7
[6] knitr_1.28       hrbrthemes_0.8.0 ggplot2_3.3.1    extrafont_0.17  

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0  xfun_0.14         purrr_0.3.4       colorspace_1.4-1 
 [5] vctrs_0.3.1       generics_0.0.2    htmltools_0.4.0   viridisLite_0.3.0
 [9] yaml_2.2.1        rlang_0.4.6       pillar_1.4.4      glue_1.4.1       
[13] withr_2.2.0       gdtools_0.2.2     lifecycle_0.2.0   stringr_1.4.0    
[17] munsell_0.5.0     gtable_0.3.0      rvest_0.3.5       htmlwidgets_1.5.1
[21] evaluate_0.14     curl_4.3          fansi_0.4.1       Rttf2pt1_1.3.8   
[25] Rcpp_1.0.4.6      readr_1.3.1       scales_1.1.1      webshot_0.5.2    
[29] jsonlite_1.6.1    systemfonts_0.2.3 hms_0.5.3         packrat_0.5.0    
[33] digest_0.6.25     stringi_1.4.6     bookdown_0.19     grid_4.0.0       
[37] cli_2.0.2         tools_4.0.0       magrittr_1.5      lazyeval_0.2.2   
[41] tibble_3.0.1      crayon_1.3.4      extrafontdb_1.0   tidyr_1.1.0      
[45] pkgconfig_2.0.3   ellipsis_0.3.1    data.table_1.12.8 xml2_1.3.2       
[49] lubridate_1.7.9   assertthat_0.2.1  rmarkdown_2.2     httr_1.4.1       
[53] rstudioapi_0.11   R6_2.4.1          compiler_4.0.0  
boshek commented 4 years ago

👋 @tspeidel-suncor

Thanks for the report. I am unable to reproduce this on my system which makes be think it is some internal network issues.

library(weathercan)
kam <- weather_dl(station_ids = 51423, start = "2016-01-01", end = "2016-02-15")
#> As of weathercan v0.3.0 time display is either local time or UTC
#> See Details under ?weather_dl for more information.
#> This message is shown once per session
sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
#> [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
#> [5] LC_TIME=English_Canada.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.0  magrittr_1.5    tools_4.0.0     htmltools_0.4.0
#>  [5] yaml_2.2.1      Rcpp_1.0.4.6    stringi_1.4.6   rmarkdown_2.2  
#>  [9] highr_0.8       knitr_1.28      stringr_1.4.0   xfun_0.14      
#> [13] digest_0.6.25   rlang_0.4.6     evaluate_0.14

What happens when you try the following code? httr::GET('https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=51423&timeframe=1&submit=Download%2BData&Year=2016&Month=01')

tspeidel-ey commented 4 years ago

What happens when you try the following code? httr::GET('https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=51423&timeframe=1&submit=Download%2BData&Year=2016&Month=01')

Response [https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=51423&timeframe=1&submit=Download%2BData&Year=2016&Month=01]
  Date: 2020-06-11 15:32
  Status: 200
  Content-Type: application/force-download
  Size: 127 kB
<BINARY BODY>

Still get the same error:

R> kam <- weather_dl(station_ids = 51423, start = "2016-01-01", end = "2016-02-15")
Error: Problem with `mutate()` input `html`.
x Unauthorized (HTTP 401).
i Input `html` is `purrr::map(...)`.
Run `rlang::last_error()` to see where the error occurred.
steffilazerte commented 4 years ago

Hmm, I also can't reproduce the issue. Your 'GET' call looks like it worked (Status: 200). The only other network call is httr::GET("https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=txt&stationID=51423&timeframe=1&submit=Download%2BData")

But I can't understand why one would work and the other not.

First, is weathercan up-to-date? (should be v.0.3.4)

Second, perhaps there's a mismatch with the version of the purrr package

Could you:

  1. packageVersion("purrr") - Let us know what version of purrr you're currently using
  2. remotes::update_package("purrr") - Update the purrr package and optionally, it's dependencies
  3. Let us know if that worked. If so, I'll have to update the dependent package version of purrr.

If that still doesn't work, could you type rlang::last_error() after the error, and post the output here? Thanks!

boshek commented 4 years ago

Just will chime in here and say that the first comment has that info and all my package versions seem to line up with @tspeidel-suncor. That is both purrr and weathercan are up to date. Here is the rlang output:

R> rlang::last_error()
<error/dplyr_error>
Problem with `mutate()` input `html`.
x Unauthorized (HTTP 401).
i Input `html` is `purrr::map(...)`.
Backtrace:
  1. weathercan::weather_dl(...)
  2. weathercan:::weather_single(date_range, s, interval, encoding)
  3. dplyr::tibble(date_range = date_range)
 10. dplyr::mutate(...)
 12. dplyr:::mutate_cols(.data, ...)
Run `rlang::last_trace()` to see the full context.

Another troubleshooting step is for @tspeidel-suncor to try this on a different network and see if the same results ensue.

steffilazerte commented 4 years ago

Oops, of course, I didn't go back to the original comment (I think I was actually looking at your session_info, @boshek!)

tspeidel-ey commented 4 years ago

Another troubleshooting step is for @tspeidel-suncor to try this on a different network and see if the same results ensue.

Yes, it is the company network 🙄

tspeidel-ey commented 4 years ago

It turns out the corporate network filters and prevents the http call. I can confirm that by using http instead of https :

httr::GET('http://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=51423&timeframe=1&submit=Download%2BData&Year=2016&Month=01')
Response [http://cgysecblcpxy003.network.lan/?cfru=aHR0cDovL2NsaW1hdGUud2VhdGhlci5nYy5jYS9jbGltYXRlX2RhdGEvYnVsa19kYXRhX2UuaHRtbD9mb3JtYXQ9Y3N2JnN0YXRpb25JRD01MTQyMyZ0aW1lZnJhbWU9MSZzdWJtaXQ9RG93bmxvYWQlMkJEYXRhJlllYXI9MjAxNiZNb250aD0wMQ==]
  Date: 2020-06-11 13:47
  Status: 401
  Content-Type: text/html; charset=utf-8
  Size: 5.99 kB
<html>
<head>
<TITLE>Access Denied</TITLE>
<meta name="author" content="Suncor Energy Inc.">
<meta name="description" content="Denied Access Policy">
</head>
<body style="height: 450px;font-family="Helvetica;" id="pageBody">
&nbsp;
<img src="...
<BR><BR>

I looked at the documentation hoping to find an option to force https since it looks like it's supported on climate.weather.gc.ca.

How can this be done or can it be implemented as a feature? I imagine this is a common issue in some corporate networks.

tspeidel-ey commented 4 years ago

Perhaps: options(weathercan.urls.stations = "your_new_url") would help?

steffilazerte commented 4 years ago

Oh, I didn't even think about that! You can definitely set your own url:

options(weathercan.urls.weather = "https://climate.weather.gc.ca/climate_data/bulk_data_e.html")

I'm just testing changing that as the default if there are no problems I'll definitely put that in the next update as it's better to use https anyway. Thanks for delving into this!

steffilazerte commented 4 years ago

This should be now fixed on a dev branch. If you want to test it DON'T use the options to set the url and install the dev version:

remotes::install_github("ropensci/weathercan", "dev_0.3.5")
library(weathercan)
kam <- weather_dl(station_ids = 51423, start = "2016-01-01", end = "2016-02-15")

You can confirm that weathercan uses https by default by looking at the options:

getOption("weathercan.urls.weather")
[1] "https://climate.weather.gc.ca/climate_data/bulk_data_e.html"

I'll also look into more graceful errors. I thought the function was supposed to fail a bit more informatively if something like this happened.

Let me know how it goes!

tspeidel-ey commented 4 years ago

Wow, that was fast! Will test

tspeidel-ey commented 4 years ago

All works in dev_0.3.5! Thanks so much!

tspeidel-ey commented 4 years ago

It seems the problem is back in the latest dev version. Had to downgradde to 0.3.5

boshek commented 4 years ago

👋 @tspeidel-suncor

Can you post a reprex? I am unable to reproduce. Here is mine:

packageVersion("weathercan")
#> [1] '0.4.0'
library(weathercan)
kam <- weather_dl(station_ids = 51423, start = "2016-01-01", end = "2016-02-15")
#> As of weathercan v0.3.0 time display is either local time or UTC
#> See Details under ?weather_dl for more information.
#> This message is shown once per session
Session info ``` r devtools::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.2 (2020-06-22) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_Canada.1252 #> ctype English_Canada.1252 #> tz America/Los_Angeles #> date 2020-08-31 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> backports 1.1.7 2020-05-13 [1] CRAN (R 4.0.0) #> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0) #> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) #> curl 4.3 2019-12-02 [1] CRAN (R 4.0.0) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0) #> devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2) #> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0) #> dplyr 1.0.1 2020-07-31 [1] CRAN (R 4.0.2) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0) #> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.0) #> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.0) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) #> knitr 1.29 2020-06-23 [1] CRAN (R 4.0.2) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) #> lubridate 1.7.9 2020-06-08 [1] CRAN (R 4.0.0) #> magrittr 1.5.0.9000 2020-08-28 [1] Github (tidyverse/magrittr@15f6f07) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) #> pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2) #> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) #> processx 3.4.3 2020-07-05 [1] CRAN (R 4.0.2) #> ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0) #> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2) #> readr 1.3.1 2018-12-21 [1] CRAN (R 4.0.0) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2) #> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2) #> rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.0) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) #> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0) #> tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.2) #> tidyr 1.1.1 2020-07-31 [1] CRAN (R 4.0.2) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0) #> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.2) #> vctrs 0.3.2 2020-07-15 [1] CRAN (R 4.0.2) #> weathercan * 0.4.0 2020-08-31 [1] Github (ropensci/weathercan@6f9fccd) #> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0) #> xfun 0.16 2020-07-24 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> #> [1] C:/Users/salbers/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.2/library ```
steffilazerte commented 4 years ago

@tspeidel-suncor Could you try with the current version on CRAN? It was just uploaded today.

tspeidel-ey commented 4 years ago

Works again, thanks!

R> packageVersion("weathercan")
[1] ‘0.4.0’

fmt <- weather_dl(station_ids = 27216, start = "2016-01-01", end = "2016-02-15")
head(fmt)
# A tibble: 6 x 35
  station_name station_id station_operator prov    lat   lon  elev climate_id WMO_id TC_id date       time                year  month day  
  <chr>             <dbl> <lgl>            <chr> <dbl> <dbl> <dbl> <chr>      <chr>  <chr> <date>     <dttm>              <chr> <chr> <chr>
1 FORT MCMURR~      27216 NA               AB     56.6 -111.  369. 3062696    71585  XMM   2016-01-01 2016-01-01 00:00:00 2016  01    01   
2 FORT MCMURR~      27216 NA               AB     56.6 -111.  369. 3062696    71585  XMM   2016-01-01 2016-01-01 01:00:00 2016  01    01   
3 FORT MCMURR~      27216 NA               AB     56.6 -111.  369. 3062696    71585  XMM   2016-01-01 2016-01-01 02:00:00 2016  01    01   
4 FORT MCMURR~      27216 NA               AB     56.6 -111.  369. 3062696    71585  XMM   2016-01-01 2016-01-01 03:00:00 2016  01    01   
5 FORT MCMURR~      27216 NA               AB     56.6 -111.  369. 3062696    71585  XMM   2016-01-01 2016-01-01 04:00:00 2016  01    01   
6 FORT MCMURR~      27216 NA               AB     56.6 -111.  369. 3062696    71585  XMM   2016-01-01 2016-01-01 05:00:00 2016  01    01   
# ... with 20 more variables: hour <chr>, weather <chr>, hmdx <dbl>, hmdx_flag <chr>, pressure <dbl>, pressure_flag <chr>, rel_hum <dbl>,
#   rel_hum_flag <chr>, temp <dbl>, temp_dew <dbl>, temp_dew_flag <chr>, temp_flag <chr>, visib <dbl>, visib_flag <chr>, wind_chill <dbl>,
#   wind_chill_flag <chr>, wind_dir <dbl>, wind_dir_flag <chr>, wind_spd <dbl>, wind_spd_flag <chr>
R>