r-lib / httr

httr: a friendly http package for R
https://httr.r-lib.org
Other
986 stars 1.99k forks source link

Support for `--head` option from curl? #612

Closed eliocamp closed 4 years ago

eliocamp commented 5 years ago

I'm trying to get only the status code from an url in a server that doesn't support the HEAD verb. On the command line, I can get that with this:

curl -s --head -X GET "www..."

I've been searching and haven't found how or if the --head option is supported in httr or the curl package.

Have I missed it or is it not supported?

cderv commented 5 years ago

From curl book

Normally however you do not specify the method in the command line, but instead the exact method used depends on the specific options you use. GET is default, using -d or -F makes it a POST, -I generates a HEAD and -T sends a PUT.

According to man page, -I is the same as --head you use. So this should be the same but maybe the method changes something.

Here is how you could do it with curl and also httr::HEAD

library(curl)
h <- new_handle()
req <- curl_fetch_memory("https://httpbin.org/uuid", handle = h)
rawToChar(req$content)
#> [1] "{\n  \"uuid\": \"2986fe18-e8ed-4e36-8fcb-b5ee028ee771\"\n}\n"
# set the option for nobody, just header
handle_setopt(h, nobody = TRUE)
req <- curl_fetch_memory("https://httpbin.org/uuid", handle = h)
# you get the header
writeLines(rawToChar(req$headers))
#> HTTP/1.1 200 OK
#> Access-Control-Allow-Credentials: true
#> Access-Control-Allow-Origin: *
#> Content-Encoding: gzip
#> Content-Type: application/json
#> Date: Fri, 06 Sep 2019 20:44:30 GMT
#> Referrer-Policy: no-referrer-when-downgrade
#> Server: nginx
#> X-Content-Type-Options: nosniff
#> X-Frame-Options: DENY
#> X-XSS-Protection: 1; mode=block
#> Connection: keep-alive
#> 
# and no content
rawToChar(req$content)
#> [1] ""

# using httr HEAD should give you the same - the option nobody is set to TRUE internally
res <- httr::HEAD("https://httpbin.org/uuid")
httr::headers(res)
#> $`access-control-allow-credentials`
#> [1] "true"
#> 
#> $`access-control-allow-origin`
#> [1] "*"
#> 
#> $`content-encoding`
#> [1] "gzip"
#> 
#> $`content-type`
#> [1] "application/json"
#> 
#> $date
#> [1] "Fri, 06 Sep 2019 20:44:31 GMT"
#> 
#> $`referrer-policy`
#> [1] "no-referrer-when-downgrade"
#> 
#> $server
#> [1] "nginx"
#> 
#> $`x-content-type-options`
#> [1] "nosniff"
#> 
#> $`x-frame-options`
#> [1] "DENY"
#> 
#> $`x-xss-protection`
#> [1] "1; mode=block"
#> 
#> $connection
#> [1] "keep-alive"
#> 
#> attr(,"class")
#> [1] "insensitive" "list"
httr::content(res)
#> NULL

Created on 2019-09-06 by the reprex package (v0.3.0)

I am not sure you can do httr::GET and configure to just the header because httr::GET will set httpget option to 1 (TRUE) and according to the doc it will force option nobody to 0 (FALSE) https://curl.haxx.se/libcurl/c/CURLOPT_HTTPGET.html With httr, I think it is expected to do a HEAD request.

Hope it helps

Also, for this type of questions, do no hesitate to ask first on https://community.rstudio.com. There is a broader community ready to help and share experience. Github repo is more for bugs or feature requests.

eliocamp commented 5 years ago

Thanks! Could it be that setting nobody = TRUE then sends uses HEAD instead of GET? I'm getting 501 from the server, which is what I got when I tried using httr::HEAD(). The server I'm pinging does not support HEAD requests, unfortunately.

Using curl --head -X GET "www...", on the other hand, does work, so apparently it does send a GET request and only download the header.

In any case, I think I found a workaround in this issue. Unless you think this is something that should be supported by httr, feel free to close the issue.