socketry / protocol-http1

MIT License
6 stars 14 forks source link

Response header values folded over multiple lines can't be parsed #38

Open vytautaskubilius opened 1 month ago

vytautaskubilius commented 1 month ago

When receiving a response with a header that is folded over multiple lines, the following error is returned:

Protocol::HTTP1::BadHeader: Could not parse header

I first noticed this issue when trying to migrate to using the async-http-faraday adapter - some requests to external APIs started returning these responses. I was able to reproduce this behaviour by adapting the example from the readme with a simple localhost NGINX server that returns a multi-line header:

repro.rb

require 'async'
require 'io/stream'
require 'async/http/endpoint'
require 'protocol/http1/connection'

Async do
    endpoint = Async::HTTP::Endpoint.parse("http://localhost:8042", alpn_protocols: ["http/1.1"])

    peer = endpoint.connect

    puts "Connected to #{peer} #{peer.remote_address.inspect}"

    # IO Buffering...
    stream = IO::Stream::Buffered.new(peer)
    client = Protocol::HTTP1::Connection.new(stream)

    def client.read_line
        @stream.read_until(Protocol::HTTP1::Connection::CRLF) or raise EOFError
    end

    puts "Writing request..."
    client.write_request("localhost:8042", "GET", "/", "HTTP/1.1", [["Accept", "*/*"]])
    client.write_body("HTTP/1.1", nil)

    puts "Reading response..."
    response = client.read_response("GET")

    puts "Got response: #{response.inspect}"

    puts "Closing client..."
    client.close
end

nginx.conf

user  nginx;
worker_processes  auto;

error_log  /var/log/nginx/error.log notice;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    add_header "Test-Multiline-Header" "foo;
        bar";
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Start an NGINX container with this configuration and run the repro script to observe the error:

$ docker run --name test-nginx -v ./nginx.conf:/etc/nginx/nginx.conf -d -p 8042:80 nginx
66c3263f56467000d11fec50ffe76e3b29d3f3cc2e1396a559c801ea80313be6
$ ruby repro.rb
Connected to #<Socket:0x00000001206bbab8> #<Addrinfo: [::1]:8042 TCP>
Writing request...
Reading response...
  0.0s     warn: Async::Task [oid=0x230] [ec=0x244] [pid=24080] [2024-09-12 16:40:27 +0300]
               | Task may have ended with unhandled exception.
               |   Protocol::HTTP1::BadHeader: Could not parse header: "Test-Multiline-Header: foo;\n\t\tbar"
               |   → /Users/vytautaskubilius/.rbenv/versions/3.3.3/lib/ruby/gems/3.3.0/gems/protocol-http1-0.22.0/lib/protocol/http1/connection.rb:243 in `read_headers'
               |     /Users/vytautaskubilius/.rbenv/versions/3.3.3/lib/ruby/gems/3.3.0/gems/protocol-http1-0.22.0/lib/protocol/http1/connection.rb:222 in `read_response'
               |     repro.rb:26 in `block in <main>'
               |     /Users/vytautaskubilius/.rbenv/versions/3.3.3/lib/ruby/gems/3.3.0/gems/async-2.16.1/lib/async/task.rb:197 in `block in run'
               |     /Users/vytautaskubilius/.rbenv/versions/3.3.3/lib/ruby/gems/3.3.0/gems/async-2.16.1/lib/async/task.rb:422 in `block in schedule'

The HTTP/1.1 specs indicate that multiline header values are acceptable as long at they start with horizontal tabs or spaces:

HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab. All linear white space, including folding, has the same semantics as SP. A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream.

ioquatix commented 1 month ago

https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.4

According to this, folded headers are obsolete and a bad request is appropriate. Wdyt?

vytautaskubilius commented 1 month ago

Thanks for pointing that out - I hadn't noticed the link to the newer RFC. Reading through that though I still think that in this case parsing the folded value instead of returning an error would be more appropriate. The RFC only provides a single option for user agents receiving the headers in a response:

   A user agent that receives an obs-fold in a response message that is
   not within a message/http container MUST replace each received
   obs-fold with one or more SP octets prior to interpreting the field
   value.

That same option is also valid for servers and proxies, so doing that should account for all use cases.

ioquatix commented 1 month ago

I'm kind of curious about what servers are generating such a response in a production environment.

Are you able to provide more details?

I'd be willing to accept a PR to implement this (conversion of obs-fold with spaces) but I'm slightly against it as it is deprecated behaviour.

jakeonfire commented 1 month ago

we integrate with many 3rd party APIs, and one of them (an IoT hardware company from the UK) is returning a header with an obs-fold value, so we're seeing the following error:

Protocol::HTTP1::BadHeader
Could not parse header: "Strict-Transport-Security: max-age=31536000;\n\t\t\t   includeSubDomains; preload"
ioquatix commented 1 month ago

What about reporting a bug to them too?

vytautaskubilius commented 1 month ago

I’m afraid that’s not a reliable (and potentially not scalable) option for a couple of reasons:

ioquatix commented 1 month ago

I would be happy to accept a PR to parse folded headers.