socketry / async-http

MIT License
298 stars 45 forks source link

Async::HTTP::Internet modifying passed headers. #99

Closed kaorihinata closed 1 year ago

kaorihinata commented 2 years ago

So, I was writing something simple where I keep a basic set of headers around and pass them to an Async::HTTP::Client instance. Then to handle pagination (as the Link header returns full URLs), I passed the new URL, and same array of headers to the client's parent Async::HTTP::Internet's get method. When I did though, I noticed that it started appending accept-encoding headers to the array I passed to it. I'm not sure why it's doing that, but it is.

I stripped down the script for reproduction because it actually only requires a call to Async::HTTP::Internet.get:

#! /usr/bin/env ruby
# frozen_string_literal: true

require 'async'
require 'async/http/internet'

Async do
  headers = []
  internet = Async::HTTP::Internet.new
  pp headers
  internet.get('https://www.google.com', headers)
  pp headers
  internet.get('https://www.google.com', headers)
  pp headers
  internet.get('https://www.google.com', headers)
  pp headers
end

...and the output is, perplexingly:

[]
[["accept-encoding", "gzip, identity"]]
[["accept-encoding", "gzip, identity"], ["accept-encoding", "gzip, identity"]]
[["accept-encoding", "gzip, identity"], ["accept-encoding", "gzip, identity"], ["accept-encoding", "gzip, identity"]]

Am I missing something? Due to the way that the methods are attached it's not directly in the documentation, but I thought the second argument of Async::HTTP:Internet.get was headers. Is it not? Is something internally along the way appending to my headers without creating a copy?

ioquatix commented 2 years ago

This looks like incorrect behaviour but let me investigate from my end.

kaorihinata commented 2 years ago

Alright. I did take a closer look, and at a glance it does seem like Protocol::HTTP::Headers will return the input array if I don't freeze it. I tried freezing it, and indeed it does behave as described here and I don't end up with extra headers, but since it doesn't happen with Async::HTTP::Client I figured it may not have been intentional.

ioquatix commented 2 years ago

That seems like the kind of behaviour I'd implement given the different trade-offs.

The problem is, calling dup on everything can be expensive, if it's already locally mutable. In theory Array could be CoW, but I don't think it is in practice.

e.g. the most common case is going to be internet.get(..., [[key, value], [key, value]]) and there is little point and only cost associated with duping it (for every request). And people using async-http tend to be making a lot of requests.

kaorihinata commented 2 years ago

So if I provide a reference to an array, it will be mutated, and eventually become unusable because it will hit the server header count limit. If I provide a reference to a frozen array then it will be dup'd every time. If I instantiate an array every time then I avoid the dup but I'm creating a new array every time. Does that sound correct?

ioquatix commented 2 years ago

Yes, that sounds correct.

Essentially you are transferring ownership of the headers to the request, and after that you should not trust the contents of the headers array. Alternatively, you can freeze it, which will correctly dup it internally. Or alternatively, if you don't want to modify the source headers, call dup yourself.

kaorihinata commented 2 years ago

If it’s working as designed then this can be closed.

ioquatix commented 2 years ago

I will write more documentation to make this behaviour clear and the expected usage.