r-lib / httr

httr: a friendly http package for R
https://httr.r-lib.org
Other
986 stars 1.99k forks source link

Avoid URL-encoding cookie values twice #613

Closed salim-b closed 1 year ago

salim-b commented 5 years ago

When I set a cookie for a httr GET request, the provided cookie value always gets URL-encoded regardless if it already was or not:

library(magrittr)

# define cookie value
cookie_value <- "s:AK+G5ez1"

# URL-encode it
cookie_value_enc <- xml2::url_escape(cookie_value)

# make GET request supplying the URL-encoded cookie value
result <- httr::GET(url = "https://www.republik.ch/",
                    httr::set_cookies(`connect.sid` = cookie_value_enc))

# print cookie value as it was transmitted by httr (double-URL-encoded!)
# and URL-decode and print it until it matches the original value
cookie_value_httr <-
  result$request$options$cookie %>%
  stringr::str_remove(pattern = "connect.sid=")

cookie_value_httr
#> [1] "s%253AAK%252BG5ez1"

is_eq <- FALSE

while ( !is_eq )
{
  cookie_value_httr %<>% xml2::url_unescape()
  if ( cookie_value_httr == cookie_value ) is_eq <- TRUE
  print(cookie_value_httr)
}
#> [1] "s%3AAK%2BG5ez1"
#> [1] "s:AK+G5ez1"

Created on 2019-09-07 by the reprex package (v0.3.0)

The curl CLI (7.47.0) on the other hand doesn't do this (i.e. the cookie value gets transmitted unmodified) when I run either of:

curl --cookie 'connect.sid=s:AK+G5ez1' https://www.republik.ch/
curl --cookie 'connect.sid=s%3AAK%2BG5ez1' https://www.republik.ch/

I'm using httr 1.4.1 (with underlying libcurl 7.47.0) with R 3.6.1 on Ubuntu 16.04.

cderv commented 5 years ago

When we look at the code, we see that indeed set_cookie will encode any strings with curl_escape https://github.com/r-lib/httr/blob/6b84c576f866956e01835e1b6f02c55c8012d4e0/R/cookies.r#L14-L23

One way could be to add an argument to set_config() or so that it could respect class AsIs when string is encapsulated in I( )

Currently, you can do as you in curl by providing directly the cookie option

cookie_value <- "s:AK+G5ez1"
# URL-encode it
cookie_value_enc <- xml2::url_escape(cookie_value)
cookie_value_enc
#> [1] "s%3AAK%2BG5ez1"
# this will get escaped again
httr::GET("https://httpbin.org/cookies", httr::set_cookies(`connect.sid` = cookie_value_enc))
#> Response [https://httpbin.org/cookies]
#>   Date: 2019-09-07 12:20
#>   Status: 200
#>   Content-Type: application/json
#>   Size: 63 B
#> {
#>   "cookies": {
#>     "connect.sid": "s%253AAK%252BG5ez1"
#>   }
#> }
# provide a not already escape cookie
httr::GET("https://httpbin.org/cookies", httr::set_cookies(`connect.sid` = cookie_value))
#> Response [https://httpbin.org/cookies]
#>   Date: 2019-09-07 12:20
#>   Status: 200
#>   Content-Type: application/json
#>   Size: 59 B
#> {
#>   "cookies": {
#>     "connect.sid": "s%3AAK%2BG5ez1"
#>   }
#> }
# or pass directly the encoded cookie using config
httr::GET("https://httpbin.org/cookies", httr::config(cookie = paste0("connect.sid=", cookie_value_enc)))
#> Response [https://httpbin.org/cookies]
#>   Date: 2019-09-07 12:20
#>   Status: 200
#>   Content-Type: application/json
#>   Size: 59 B
#> {
#>   "cookies": {
#>     "connect.sid": "s%3AAK%2BG5ez1"
#>   }
#> }

Created on 2019-09-07 by the reprex package (v0.3.0)

Hope it helps

salim-b commented 5 years ago

Thanks for the httr::config(cookie = paste0("connect.sid=", cookie_value_enc)) hint!

But I'm still wondering

  1. is URL-escaping cookie values really necessary? The curl CLI doesn't seem to do it and the request succeeds in my case (i.e. the server recognizes the cookie value, which of course is actually a different one than I'm posting here):

    curl --verbose --cookie 'connect.sid=s:AK+G5ez1' https://www.republik.ch/ 2>&1 | grep 'Cookie: '
    
    > Cookie: connect.sid=s:AK+G5ez1
  2. if you come to the conclusion that 1) is still necessary, wouldn't it be possible to somehow auto-detect if the value(s) provided to set_cookies() already are URL-encoded, and if so, leave them untouched?

    I suppose this is kind of an "ambuigity hell", but at least there seem to be some "good enough" solutions...

hadley commented 1 year ago

Since httr is superseded in favour of httr2, I've filed an isuse over there: https://github.com/r-lib/httr2/issues/369