lostisland / faraday

Simple, but flexible HTTP client library, with support for multiple backends.
https://lostisland.github.io/faraday
MIT License
5.72k stars 972 forks source link

`on_data` callback should be passed in the response's status code #1426

Closed catlee closed 2 years ago

catlee commented 2 years ago

If you request a URL with redirects enabled, I don't think there's a way to distinguish between chunks from the initial request or redirected requests.

require 'webmock'
require 'faraday'
require 'faraday/net_http'
require 'faraday/follow_redirects'

include WebMock::API
WebMock.enable!

stub_request(:get, 'https://www.example.com')
  .to_return(body: 'You are being redirected',
             status: [302, 'Content moved'],
             headers: { 'Location' => 'https://example.com' })

stub_request(:get, 'https://example.com')
  .to_return(body: 'You have arrived')

client = Faraday.new do |f|
  f.use Faraday::FollowRedirects::Middleware
end

client.get('https://www.example.com') do |req|
  req.options.on_data = proc do |chunk, _|
    puts "got: #{chunk}"
  end
end

outputs

got: You are being redirected
got: You have arrived

And there is no way within on_data to distinguish between chunks from the initial response, and chunks from the redirected response.

Passing in the response and/or status code, and/or headers, and/or env to the the callback would help handle this case.

iMacTia commented 2 years ago

Hi @catlee, thank you for raising this and sorry for not getting back to you sooner! I wasn't really sure what the best approach would be to address this, so I had to take a deeper look into what's happening and how we could change things in order to let you do what you need.

First things first, let's see what's the challenge here: the on_data block that you provide to Faraday is not managed by Faraday directly. Instead, it's used by adapters and this is why only some of them support it (there are 3 at the moment, according to awesome-faraday). This means that making this work with the follow_redirects middleware would require some level of knowledge from the adapter that the middleware is in the stack, which is something we normally want to avoid (so that we don't end up with dependencies between adapters and middleware).

Another issue is that, when the on_data option was first introduced, the block signature was set to (chunk, overall_received_bytes). This is great for the majority of cases, but as you rightly pointed out, it doesn't really tell you anything about the response itself, like the HTTP status code.

After spending some time thinking about this, I came to the following 3 options to address this:

  1. Use the callback from follow_redirects middleware. This is a bit undocumented, but the middleware actually allows you to set a callback when following a redirect, which you could use to know when that happens.
  2. Change the signature of the on_data block to add the response info (status code, headers, etc...) and update all existing supporting adapters (hard to make this backwards-compatible)
  3. Add a new request option to specify when the on_data callback should be used. This would be a new block that you can pass, with a new signature (we should include the response info) that tells the adapter if the response should be treated as streaming or not. Similar in principle to solution 2, less elegant, but easier to introduce as it wouldn't have the backwards-compatibility issue

cc @olleolleolle @grosser your input on this would be appreciated 😄!

iMacTia commented 2 years ago

@catlee FYI -- I've just opened a PR to add this functionality. In the end, I went for revamping the whole streaming API. Changes are backwards-compatible, but it means adapters will need to support this; I'll shortly open a PR for the net_http adapter to add the new functionality.

It would be great if you could take a look at #1439 and the net_http PR once it's open, and ideally give it a try yourself to let me know if it works as expected 😄! Also, please let me know if you're using a different adapter so I can prioritize updating that as well 👍

iMacTia commented 2 years ago

Faraday 2.5.1 has just been released. It allows to update faraday-net_http up to version 3.0, which includes the new streaming API. It would be great if you could give it a try and let me know if it helps.

After upgrading, you'll be able to do the following:

client.get('https://www.example.com') do |req|
  req.options.on_data = proc do |chunk, _size, env|
    puts "got: #{chunk}" if env.status < 300 # ignores redirects
  end
end
jsonperl commented 6 months ago

Faraday 2.5.1 has just been released. It allows to update faraday-net_http up to version 3.0, which includes the new streaming API. It would be great if you could give it a try and let me know if it helps.

After upgrading, you'll be able to do the following:

client.get('https://www.example.com') do |req|
  req.options.on_data = proc do |chunk, _size, env|
    puts "got: #{chunk}" if env.status < 300 # ignores redirects
  end
end

That works perfectly thanks 👍🏻