HTTP::Client overwrites body if it isn't read before another request is performed

backus commented 8 years ago

If I use HTTP to perform two requests in a row and I don't do response.body.to_s on the first then the body string for the first response will be overwritten with "".

Here is a script that demonstrates the bug:

gem 'http', '2.0.3'
require 'http'

URL = Addressable::URI.parse('http://mockbin.org/bin/1dee24cd-defa-467a-9c5f-6bccaadca3ad')

puts "Running with http.rb version #{HTTP::VERSION}"

client = HTTP.headers('X-Some-Config' => 'Whatever')

puts "I'm going to make 4 requests in total. All of them are POST requests to the url you gave me."

puts "After the first and second request I'm going to print the response body immediately."

puts "I'm then going to make the third and fourth request back to back" \
     "without touching the response body of request #3"

puts "Request #1 and #3 should have the same response body but the fourth request" \
     "seems to clobber the body string"

puts

puts "Performing request #1..."
response1 = client.post(URL)
puts "request #1 response: #{response1.body.to_s.inspect}"
puts

puts "Performing request #2..."
response2 = client.post(URL)
puts "request #2 response: #{response2.body.to_s.inspect}"
puts

puts "Performing request #3..."
response3 = client.post(URL)
puts "Performing request #4..."
response4 = client.post(URL)

puts
puts "Request #3 and #4 done"

puts "Response body for request #1: #{response1.body.to_s.inspect}"
puts "Response body for request #3: #{response3.body.to_s.inspect}"

Output when I run the script:

Running with http.rb version 2.0.3 with ruby 2.3.1
I'm going to make 4 requests in total. All of them are POST requests to the url you gave me.
After the first and second request I'm going to print the response body immediately.
I'm then going to make the third and fourth request back to backwithout touching the response body of request #3
Request #1 and #3 should have the same response body but the fourth requestseems to clobber the body string

Performing request #1...
request #1 response: "Good job you made a POST request!"

Performing request #2...
request #2 response: "Good job you made a POST request!"

Performing request #3...
Performing request #4...

Request #3 and #4 done
Response body for request #1: "Good job you made a POST request!"
Response body for request #3: ""

tarcieri commented 8 years ago

An empty string is definitely unexpected and undesirable in this case, however I don't think it makes sense to eagerly read bodies clients aren't interested in.

I think it would make sense to raise an exception in this case.

backus commented 8 years ago

@tarcieri Ok so the issue in my case was that we have an integration test like this:

# creating the resource
response1 = client.post(...)

# getting the resource
response2 = client.get(response1.headers['Content-Location'])

expect(response1.body.to_s).to eql(response2.body.to_s)

Are you saying that this shouldn't be allowed behavior or should raise an exception?

tarcieri commented 8 years ago

I think if you make a request, do not consume the body, make another request with the same client, and then try to consume the body for the original request, it should raise an exception.

If you would like to consume the body for the original request, it should happen before you make a subsequent request, IMO. Otherwise the client needs to both consume and hang onto any response bodies simply because you might consume them at some point in the future.

Users uninterested in the response bodies would probably file a bug for that, calling it a "memory leak", and I would agree with them.

backus commented 8 years ago

@tarcieri I think that sounds surprising to me because my impression of HTTP.rb was that it aimed to provide an immutable and chainable interface for performing requests. While I agree that raising an error would be better than what I've reported it seems like a bandaid. My impression was that the chaining methods were providing me new instances without shared mutable state.

Basically, just like how these two return values (ret1 and ret2) should not be able to mutate each other

client  = HTTP.accept('application/blah')

ret1 = client.accept("application/json")
ret2 = client.basic_auth(user: 'foo', password: 'bar')

it seems like these two return values should not be allowed to mutate each other:

client  = HTTP.accept('application/blah')

ret1 = client.get("https://google.com")
ret2 = client.get("https://github.com")

The usage of either of these two response objects should not alter the behavior/state of the other.

backus commented 8 years ago

At the very least it would be nice if I could have something like response.finalize that I can call and get a simple immutable response wrapper that isn't tied to a connection object.

tarcieri commented 8 years ago

Calling #to_s on the body prior to making another connection will accomplish that.

Yes, I will admit this behavior is a little bit surprising even to me.

ixti commented 8 years ago

there's helper for that too:

client  = HTTP.accept('application/blah')

ret1 = client.get("https://google.com").flush
ret2 = client.get("https://github.com").flush

ret1.to_s[0..42]
# => "<HTML><HEAD><meta http-equiv=\"content-type\""

ret2.to_s[0..42]
# => " "

ixti commented 8 years ago

flush consumes body and returns response itself

backus commented 8 years ago

Would it introduce performance implications and/or breaking changes if each response object had a separate connection object?

backus commented 8 years ago

Also thank you @ixti that is helpful

ixti commented 8 years ago

It will be (probably) a breaking change. And yeah, I guess it will make it easier to shoot the foot for one. But in general I think it's pretty doable and might become a pretty good improvement.

britishtea commented 7 years ago

I think holding on to the last pending response in HTTP::Connection could solve this problem.

The connection already tracks if a response is pending (i.e. the body is not read yet) and has enough information to release its reference to the response to avoid a memory leak (#finish_response, #close).

If @pending_response (or an additional instance variable) were to contain an actual HTTP::Response instead of a bool the connection could call #flush on it when it's asked to send a new request.

httprb / http

HTTP::Client overwrites body if it isn't read before another request is performed #371