igrigorik / em-http-request

Asynchronous HTTP Client (EventMachine + Ruby)
1.22k stars 220 forks source link

Intermittent Content-decoder error on compressed response from www.google.com #284

Closed outoftime closed 8 years ago

outoftime commented 9 years ago

We're observing an intermittent error (~40% of requests) when making a request to https://www.google.com with compressed responses accepted. Here's a reduction:

require 'em-synchrony'
require 'em-synchrony/em-http'

original_headers = {
  "User-Agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36",
  "Accept-Encoding"=>"gzip,compressed"
}

EventMachine.synchrony do
  error_count = 32.times.count do
    error = EventMachine::HttpRequest.new('https://www.google.com')
      .get(:head => original_headers)
      .error == 'Content-decoder error'
  end

  puts error_count
  EventMachine.stop
end

Note that the User-Agent header is needed to reproduce the problem; without it, the error is not in evidence.

When making requests to Google with curl, using the same headers, I haven't observed the problem, so I think the problem is specific to em-http-request or one of its dependencies.

ayanko commented 9 years ago

Have similar problem while working with https://www.googleapis.com/

Is it possible disable compression support in http-em-client?

igrigorik commented 9 years ago

@outoftime you're advertising "compressed" encoding, which em-http does not support and hence fails to decode when it receives it. Update your header to "Accept-Encoding"=>"gzip" and I believe that should fix it.

@ayanko yes, you can pass in :decoding => false to disable auto-decoding. See: https://github.com/igrigorik/em-http-request/blob/7b8d47c40191758f7df565625703727982fcd635/spec/client_spec.rb#L434

ayanko commented 9 years ago

Thx!

So this 69 faraday adapter line should be corrected at least:

options[:head]['accept-encoding'] = 'gzip'
igrigorik commented 8 years ago

Closing, please feel free to reopen if there is more to do here.. It'd be nice to figure out why we get the error, but we need a reproducible test case.

joanniclaborde commented 8 years ago

This should be reopened. I tried the code snippet in the first comment with a single encoding at a time ("Accept-Encoding"=>"gzip", "Accept-Encoding"=>"compressed" and "Accept-Encoding"=>"deflate"), and there's definitely a problem with gzip.

The error is not always the same: Zlib::DataError: invalid code lengths set, Zlib::DataError: invalid stored block lengths, Zlib::DataError: invalid distance too far back, etc.

Also, em-http-request does support the compressed encoding.

genki commented 8 years ago

I have encountered same problem.

colinmorelli commented 8 years ago

Echoing the concerns above. In my case, no additional headers are needed to see this behavior. Just a standard request to https://www.google.com/ triggers this problem with the error Zlib::DataError: invalid block type

boxofrad commented 8 years ago

Hi! 👋

I also have this issue, and used git bisect to find that the problem was introduced with this commit (https://github.com/igrigorik/em-http-request/commit/6270932b9bbcd88f2ffd8472be26f79117779695).

Here's my reproduction script (based on @outoftime's):

require 'eventmachine'
require_relative 'lib/em-http-request'

100.times do
  EventMachine.run do
    request = EventMachine::HttpRequest.new('https://www.googleapis.com/')
      .get(head: { 'Accept-Encoding' => 'gzip' })

    request.callback { EventMachine.stop }

    request.errback do
      exit! if request.error == 'Content-decoder error'
      EventMachine.stop
    end
  end
end

Here's my git bisect command:

$ git bisect start HEAD v1.0.2
$ git bisect run $SHELL -c "rm -rf vendor .bundle Gemfile.lock && bundle install --without development && bundle exec ruby elgoog.rb"

Here's the output:

6270932b9bbcd88f2ffd8472be26f79117779695 is the first bad commit
commit 6270932b9bbcd88f2ffd8472be26f79117779695
Author: Martin Ottenwaelter <martin.ottenwaelter@gmail.com>
Date:   Fri Nov 16 19:05:36 2012 +0100

    manually extract the stream from gzip files to fix #204

:040000 040000 8e43d196aaac13327a39571a9564bc9925b7d153 ac5ee32f99b6cbe71ff2d3b540bf913eef483e6b M      lib
:040000 040000 331b78ef3247f15ae362596dc40e877c8964a4fe a3c8ec13c1fb77aefa7dfcbab48f2af5019e7994 M      spec

Hope that helps! I'll do some more investigation myself - but thought sharing my findings may help somebody else to diagnose the problem further.

boxofrad commented 8 years ago

I've investigated this a little further and found that it seems to be related to the way that the stream of bytes is delivered to the gzip decoder.

To find this I captured the chunks of data from the gzipped response body of a GET request to https://www.googleapis.com/ ("Not Found") in both the successful and error cases.

Here's a script that demonstrates the issue:

require 'eventmachine'
require_relative 'lib/em-http/decoders'

GOOD = ["\x1F", "\x8B", "\b", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\xF3", "\xCB", "/", "Q", "p\xCB/\xCDK\x01\x00M\x8Ct\xB1\t\x00\x00\x00"]
BAD  = ["\x1F", "\x8B", "\b", "\x00", "\x00", "\x00", "\x00", "\x00\x00\x00\xF3\xCB/Qp\xCB/\xCDK\x01\x00M\x8Ct\xB1\t\x00\x00\x00"]

def decode(stream)
  decoder = EventMachine::HttpDecoders::GZip.new {}
  stream.each { |bytes| decoder << bytes }
end

puts 'Good stream - will succeed'
decode GOOD

puts 'Bad stream - will error'
decode BAD