ropensci / crul

R6 based http client for R (for developers)
https://docs.ropensci.org/crul
Other
107 stars 17 forks source link

should ok set an user agent #125

Closed maelle closed 4 years ago

maelle commented 4 years ago

url <- "https://doi.org/10.1093/chemse/bjq042"

crul::HttpClient$new(url)$head()$status_code
#> Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle): GnuTLS recv error (-54): Error in the pull function.

crul::HttpClient$new(url, 
                     opts = list(useragent = "my-header"))$head()$status_code
#> [1] 200

Created on 2019-11-13 by the reprex package (v0.3.0)

sckott commented 4 years ago

good question.

ok does set a user agent string by deafult, like:

ok("https://google.com", verbose=TRUE)
#> > HEAD / HTTP/1.1
#> Host: google.com
#> User-Agent: libcurl/7.54.0 r-curl/4.2 crul/0.9.0.9100

the user can choose to change the ua string like

ok("https://google.com", useragent = "hello world", verbose = TRUE)
#> > HEAD / HTTP/1.1
#> Host: google.com
#> User-Agent: hello world

in your eg url above, i guess we can't do anything automatically, but we can document this. for example, tell users that a FALSE may be incorrect depending on their use case, e.g, if they want to know if curl based scraping will work without fiddling with curl options, then the FALSE is probably correct, but if they want to fiddle with curl options, then first step would be to send verbose=TRUE so they can see whats going on with any redirects and headers. And then talk about user agent strings and some websites blocking based on user agent strings.