r-lib / httr2

Make HTTP requests and process their responses. A modern reimagining of httr.
https://httr2.r-lib.org
Other
235 stars 56 forks source link

Cannot capture 4xx response bodies with req_perform_stream #479

Closed jwimberl closed 2 months ago

jwimberl commented 3 months ago

I'm finding a difference in behavior between req_perform and req_perform_stream when it comes to capturing HTTP errors and translating them to R errors, and moreover in capturing 4xx response bodies. Both occur straightforwardly with req_perform, following the documented instructions:

req <- ...
req <- req_error(req, body = \(resp) resp_body_string(resp)
resp <- req_perform(req)

This will throw an R error that includes the text body from the error message.

However, error responses are not handled and converted to R errors in the same way using req_perform_stream:

> req <- ...
> req <- req_error(req, body = \(resp) resp_body_string(resp)
> req_perform(req)
<httr2_response>
POST http://obfuscated
Status: 400 Bad Request
Content-Type: text/plain
Body: None

This returns a response object with an error status code is returned, but does not throw an R error. It also does not include any body. This is true even when setting is_error to a lambda returning FALSE. In the req_perform case, that successfully turns off the http-to-R error handling and yields a 4xx response with a body, but for req_perform_stream the response body is still empty. The following obfuscated example uses a private API and so is not a reprex:

> request <- ...
> request <- httr2::req_error(request, is_error = \(z) FALSE)
> httr2::req_perform(request)
<httr2_response>
POST http://obfuscated
Status: 400 Bad Request
Content-Type: text/plain
Body: In memory (147 bytes)

> httr2::req_perform_stream(request, cb, buffer_kb = buffer_kb)
<httr2_response>
POST http://obfuscated
Status: 400 Bad Request
Content-Type: text/plain
Body: None

My current workaround is to repeat the request using req_perform when req_perform_stream returns an error status, but this can mean repeating large POST requests. Is there a way to get the 400 response's along with the single req_perform_stream request?

hadley commented 2 months ago

Could you provide a reproducible example using example_url()?

jwimberl commented 2 months ago

I cannot find an endpoint in example_url() that returns an error status code with an error message body; the closest is

https://webfakes.r-lib.org/httpbin.html#tag/Status-codes/paths/~1status~1:status/get

which can return any desired status code, but with no body, which doesn't allow a reproduction. Is there an endpoint that would allow that?

hadley commented 2 months ago

You could use the GitHub API:

library(httr2)

request("https://api.github.com/asdfsdadf") |> 
  req_perform()
#> Error in `req_perform()`:
#> ! HTTP 404 Not Found.

last_response() |> resp_body_string()
#> [1] "{\n  \"message\": \"Not Found\",\n  \"documentation_url\": \"https://docs.github.com/rest\",\n  \"status\": \"404\"\n}\n"

Created on 2024-07-10 with reprex v2.1.0

jwimberl commented 2 months ago

Reprex: with is_error a constant FALSE-valued function, req_perform returns a response with the error body but req_perform_stream does not:

library(httr2)
req <- request("https://api.github.com/asdfsdadf")
a <- req |> req_error(is_error = \(z) FALSE) |> req_perform()
b <- req |> req_error(is_error = \(z) FALSE) |> req_perform_stream(callback = \(z) TRUE)
a
#> <httr2_response>
#> GET https://api.github.com/asdfsdadf
#> Status: 404 Not Found
#> Content-Type: application/json
#> Body: In memory (103 bytes)
b
#> <httr2_response>
#> GET https://api.github.com/asdfsdadf
#> Status: 404 Not Found
#> Content-Type: application/json
#> Body: None

Created on 2024-07-10 with reprex v2.1.1

Here is a reproduction of the other behavior I observed, but which I'm inclined to think is the designed behavior -- that automatic conversion of error status codes to R errors happens for req_perform but not req_perform_stream:

library(httr2)
req <- request("https://api.github.com/asdfsdadf")
a <- req |> req_perform()
#> Error in `req_perform()`:
#> ! HTTP 404 Not Found.
b <- req |> req_perform_stream(callback = \(z) TRUE)

Created on 2024-07-10 with reprex v2.1.1

hadley commented 2 months ago

Hmmmm, these are very much related — req_perform_stream() never provides a body in the response because it assumes you're handling it in the callback. But maybe it makes sense for errors to be handled like req_perform()? (i.e. they are never streamed, and are instead handled the usual way). It's hard for me to imagine a case where a stream would be returned from a non-200 status code, but even in that case you could still handle it with req_error(is_error = \(z) FALSE).

jwimberl commented 2 months ago

Indeed, the bytes sent to the callback are the body containing the error message:

library(httr2)
req <- request("https://api.github.com/asdfsdadf")
result <- c()
req |> req_perform_stream(callback = \(z) result <<- c(result,z))
#> <httr2_response>
#> GET https://api.github.com/asdfsdadf
#> Status: 404 Not Found
#> Content-Type: application/json
#> Body: None
msg <- rawToChar(result)
msg
#> [1] "{\n  \"message\": \"Not Found\",\n  \"documentation_url\": \"https://docs.github.com/rest\",\n  \"status\": \"404\"\n}\n"

Created on 2024-07-10 with reprex v2.1.1

Of course, if there wasn't going to be an unexpected error, this probably isn't what the callback function would want to do with the bytes coming in. FWIW the python requests module handles this by allowing the status code and error message to be inspected before doing anything with the stream, e.g.

        if response.ok:
            stream = io.BytesIO(response.content)
            ...
        else:
            raise RuntimeError(f"Internal error: {response.text}")