httprb / http

HTTP (The Gem! a.k.a. http.rb) - a fast Ruby HTTP client with a chainable API, streaming support, and timeouts
MIT License
3.01k stars 321 forks source link

Auto-deflate raises Zlib::BufError on URLs that can otherwise be decoded #621

Open Gargron opened 4 years ago

Gargron commented 4 years ago

Example request:

url = 'https://m.huffingtonpost.es/entry/el-supremo-anula-la-sentencia-contra-otegi-y-los-demas-acusados-en-el-caso-bateragune_es_5f24056ac5b6a34284b99a0a?25f'

HTTP.use(:auto_inflate)
        .follow
        .headers('Accept-Encoding' => 'gzip')
        .get(url)

immediately raises:

Traceback (most recent call last):
        ...
        7: from vendor/bundle/ruby/2.6.0/gems/http-4.4.1/lib/http/chainable.rb:20:in `get'
        6: from vendor/bundle/ruby/2.6.0/gems/http-4.4.1/lib/http/client.rb:34:in `request'
        5: from vendor/bundle/ruby/2.6.0/gems/http-4.4.1/lib/http/redirector.rb:59:in `perform'
        4: from vendor/bundle/ruby/2.6.0/gems/http-4.4.1/lib/http/response.rb:94:in `flush'
        3: from vendor/bundle/ruby/2.6.0/gems/http-4.4.1/lib/http/response/body.rb:51:in `to_s'
        2: from vendor/bundle/ruby/2.6.0/gems/http-4.4.1/lib/http/response/inflater.rb:19:in `readpartial'
        1: from vendor/bundle/ruby/2.6.0/gems/http-4.4.1/lib/http/response/inflater.rb:19:in `finish'
Zlib::BufError (buffer error)

If you fetch the URL without use(:auto_inflate), read the body into a string, then feed it into Zlib in full, then it's correctly decoded without errors:

res = HTTP.follow.headers('Accept-Encoding' => 'gzip').get(url)
zlib = Zlib::Inflate.new(32 + Zlib::MAX_WBITS)
zlib.inflate(res.to_s)
zlib.finish
zlib.close

It must be related to how the chunks are read but I don't know enough about Zlib to understand why.

Bonias commented 3 years ago

The error is raised in the Redirector (lib/http/redirector.rb:59) when it tries to flush the body of the first response:

res = HTTP.headers('Accept-Encoding' => 'gzip').get(url)
zlib = Zlib::Inflate.new(32 + Zlib::MAX_WBITS)
zlib.inflate(res.to_s)
zlib.finish # => Zlib::BufError: buffer error

res.to_s # => " "
res.headers['Content-Encoding'] # => "gzip"

IMO returned response is faulty. It shouldn't contain Content-Encoding header if body is not compressed. On the other hand it should be easy to "fix" it on http-rb side by skipping decompression when body is flushed.