lostisland / faraday

Simple, but flexible HTTP client library, with support for multiple backends.
https://lostisland.github.io/faraday
MIT License
5.73k stars 976 forks source link

connection headers are ignored when using build_response directly #1553

Closed nvanoorschot closed 8 months ago

nvanoorschot commented 8 months ago

Basic Info

Issue description

When using Faraday::RackBuilder#build_response the headers of the given connection are ignored. Other options like url, ssl and the handlers of the connection are used, but not its headers.

Steps to reproduce

request = Faraday.build_request(:get)
connection = Faraday.new(url: 'http://httpbingo.org', headers: { 'key' => 'value' })
response = connection.builder.build_response(connection, request)

puts response.env.request_headers
=> {}

This issue can easily be worked around by simply building a request based upon the connection since connection.build_request does respect the connection headers. But because build_response is part of the public API of Faraday it feels like a bug, or at least unexpected behavior.

Possible solution

The solution could be as simple as merging the headers when building the env.

def build_env(connection, request)
  exclusive_url = connection.build_exclusive_url(
    request.path, request.params,
    request.options.params_encoder
  )

  Env.new(request.http_method, request.body, exclusive_url,
          request.options, request.headers.with_defaults(connection.headers), connection.ssl,
          connection.parallel_manager)
end
iMacTia commented 8 months ago

Hi @nvanoorschot, and thank you for opening this. Although maybe not properly documented, RackBuilder is not exactly supposed to be used as a public API. It's an internal class tightly coupled with Connection whose methods need to be called in a particular way and order.

I think your request is not unreasonable, but I feel like it opens a wider discussion on how build_response and build_env are currently implemented.

Moreover, I believe the request headers are the "source of truth" in this case, as you could technically do something like this:

conn = Faraday.new(url: 'http://httpbingo.org', headers: { 'key' => 'value' })
response = conn.post('/post') do |req|
  # Here req.headers default to the connection headers, so there's `'key' => 'value'`
  # But nothing prevents me from removing that for this particular request
  req.headers.delete('key')
end

After your change in build_env, the deleted header would be re-introduced in the response, which would make some edge-cases like this break.

I'd be interested to better understand what you're trying to accomplish here. If you're using Faraday in your ruby app this way, then I feel like you shouldn't be messing with build_response or anything like that directly. I'd be interested to hear more about your use-case and potentially suggest a better way of doing things.

Or my guess is that this comes from your tests and you're trying to mock/stub your responses, in which case I can point you to better ways of doing that.

In any case, please share more details about what you're trying to do and I'd be happy to help.

nvanoorschot commented 8 months ago

Hi @iMacTia, thanks for the quick response!

In my use-case I build a queue of Faraday requests and process these in parallel. The main Thread keeps adding new requests to the queue while the requests are executed in separate Threads. The Faraday connections are managed by a connection pool, so these can be reused. Since I only need a hand full of connections per thousands of requests and the headers are static I decided to define them only in the connection.

This is some pseudo code that illustrates the idea.

connection_pool = ConnectionPool.new { Faraday.new(url: 'http://httpbingo.org', headers: { 'key' => 'value' }) }
requests << Faraday.build_request(:post, body: "some payload")
responses = []

requests.each do |request|
  Thread.new do
    connection_pool.with do |connection|
      responses << connection.builder.build_response(connection, request)  
    end
  end
end

I agree that the request should always be leading. It would be bad if the example you gave would break.

If this setup is (currently) not supported then that is absolutely fine. If the build_response should not be called directly this way it might be a good idea to make it private. In that case the intention of the method is more clear.

Anyways, I'm interested to hear your opinion

iMacTia commented 8 months ago

Ah I see, that's an interesting use-case indeed!

First of all, I just want to make sure you've explored Faraday's out-of-the-box parallel request capabilities. If you're willing to give the async_http adapter a go, it might make your code easier to write.

On the multithreaded solution where one thread enqueues and the other one processes, you can avoid using Faraday::Request and Faraday::Response directly and instead use run_request. This method is the one used by get, post and the other connection methods. It takes 4 arguments and you don't need to know the internals, it works like the "public api" of the connection.

So instead of using Faraday::Request, you could use just a hash containing the 4 parameters, or if you prefer a more OOP approach, a simple PORO class.

It would look something like the following:

class QueuedRequest
  # Class with attr_accessors for method, url, params and headers
  # Params will become query params for GET HEAD DELETE TRACE and OPTIONS, and the body for the others
  def initialize(method, url: nil, params: nil, headers: nil)
    # set the attributes
  end

  def run(connection)
    connection.public_send(method, url, params, headers)
  end
end

requests << QueuedRequest.new(:post, body: "some payload")
responses = []

requests.each do |request|
  Thread.new do
    connection_pool.with do |connection|
      responses << request.run(connection)
    end
  end
end

This is just for illustration purposes, but the idea is to basically use the outermost facing API of the library instead of the internals.

If the build_response should not be called directly this way it might be a good idea to make it private. In that case the intention of the method is more clear.

I completely agree with that and I've had this and other improvements (as well as a rework of the whole interface and the rack builder) in my todo list. The problem is that this would be a backwards incompatible change and therefore require a new major version release, which is not an easy feat for a "plumbing" gem like Faraday. We realised how painful it was when we released v2.0 a few years ago πŸ˜…

Anyway, please let me know if the above helps and if you have any other questions!

nvanoorschot commented 8 months ago

Thx for the tips and suggestions!

I did look into in_parallel and Async but settled on using Concurrent::ThreadPoolExecutor. A disadvantage of in_parallel is that you get the results back once the longest running request has finished. I wanted a setup where after each completed request the next one would start with as little time in between as possible. I achieved that by filling the queue of the ThreadPoolExecutor, it will automatically pickup the next piece of work after it finished one. In the mean time if have the main Thread poll the database periodically for new work and add those to the queue. It is a more complex setup but increases the throughput quit significantly.

Based on your suggestion I think I will go for something like this:

connection_pool.with do |connection|
  responses << connection.public_send(request.http_method, request.path, request.body || request.params, request.headers)
end

Thx for your help! Faraday is a fantastic gem that we use for all our projects. Hope to see the rework of the whole interface soon in Faraday v3.0 :smile:

iMacTia commented 8 months ago

That would avoid building a special PORO class, looks good to me πŸ‘ And yeah I can see why you went with the custom implementation, I guess you're working on some very time-sensitive application!

I'm gonna close this, but please feel free to comment with a follow-up if you need any further help