igrigorik / http-2

Pure Ruby implementation of HTTP/2 protocol
https://httpwg.github.io/specs/rfc7540.html
MIT License
894 stars 63 forks source link

Header Compression Issue (COMPRESSION_ERROR) #97

Closed HoneyryderChuck closed 7 years ago

HoneyryderChuck commented 7 years ago

I'm having a hard-to-reproduce issue which has to do with the state of the headers in the browser, but not in other clients. Namely, the browser isn't being able to decompress the headers after a certain point in an http2 connection/session. Specifically, I'm receiving a goaway frame from the browser with COMPRESSION_ERROR type and "Framer error: 6 (DECOMPRESS_FAILURE)". The browser is Chrome 56.0.2924.87 (64-bit), and this has been tested against the master branch of http-2.

The server log is the following (only logging the goaway as it happens):

 - - - [20/Mar/2017:15:17:27 +0100] "GET / 2.0" 200 - 1.7691
- - - [20/Mar/2017:15:17:27 +0100] "GET /leaflet/dist/leaflet.css?body=1 2.0" 200 10162 0.0298
- - - [20/Mar/2017:15:17:27 +0100] "GET /cart.css?body=1 2.0" 200 4186 0.0167
- - - [20/Mar/2017:15:17:27 +0100] "GET /auth_forms.css?body=1 2.0" 200 1874 0.0327
- - - [20/Mar/2017:15:17:27 +0100] "GET /application.css?body=1 2.0" 200 874 0.0791
- - - [20/Mar/2017:15:17:27 +0100] "GET /home.css?body=1 2.0" 200 365445 0.0978
- - - [20/Mar/2017:15:17:27 +0100] "GET /orders.css?body=1 2.0" 200 119 0.0924
- - - [20/Mar/2017:15:17:27 +0100] "GET /pizzas.css?body=1 2.0" 200 - 0.1002
- - - [20/Mar/2017:15:17:27 +0100] "GET /registrations.css?body=1 2.0" 200 - 0.0689
- - - [20/Mar/2017:15:17:27 +0100] "GET /jquery.js?body=1 2.0" 200 293431 0.0660
- - - [20/Mar/2017:15:17:27 +0100] "GET /jquery_ujs.js?body=1 2.0" 200 21600 0.0658
- - - [20/Mar/2017:15:17:27 +0100] "GET /bootstrap.js?body=1 2.0" 200 120 0.0673
- - - [20/Mar/2017:15:17:27 +0100] "GET /bootstrap/alert.js?body=1 2.0" 200 2261 0.0666
- - - [20/Mar/2017:15:17:27 +0100] "GET /bootstrap/tooltip.js?body=1 2.0" 200 16346 0.0670
- - - [20/Mar/2017:15:17:27 +0100] "GET /bootstrap/popover.js?body=1 2.0" 200 3164 0.0633
received goaway: {:length=>45, :type=>:goaway, :flags=>[], :stream=>0, :last_stream=>0, :error=>:compression_error, :payload=>"Framer error: 6 (DECOMPRESS_FAILURE)."}
- - - [20/Mar/2017:15:17:27 +0100] "GET /bootstrap/dropdown.js?body=1 2.0" 200 4726 0.0713
- - - [20/Mar/2017:15:17:27 +0100] "GET /bootstrap/collapse.js?body=1 2.0" 200 5964 0.0759
- - - [20/Mar/2017:15:17:27 +0100] "GET /simplecart-js/simpleCart.js?body=1 2.0" 200 57255 0.0725
- - - [20/Mar/2017:15:17:27 +0100] "GET /leaflet/dist/leaflet.js?body=1 2.0" 200 125412 0.0747
- - - [20/Mar/2017:15:17:27 +0100] "GET /home.js?body=1 2.0" 200 149 0.0733
- - - [20/Mar/2017:15:17:27 +0100] "GET /cart.js?body=1 2.0" 200 927 0.0646
- - - [20/Mar/2017:15:17:27 +0100] "GET /orders.js?body=1 2.0" 200 1 0.0643
- - - [20/Mar/2017:15:17:27 +0100] "GET /pizzas.js?body=1 2.0" 200 327 0.0508
- - - [20/Mar/2017:15:17:28 +0100] "GET /registrations.js?body=1 2.0" 200 149 0.0434
- - - [20/Mar/2017:15:17:28 +0100] "GET /application.js?body=1 2.0" 200 600 0.0548

Chrome presents the status of subsequent requests/streams after goaway as failed in the Network tab, which probably means that it ignores subsequent streams from that connection.

I tried to use nghttp to debug this, and can't seem to reproduce the issue, as nghttp successfully loads page + assets (I've been testing with the -a option), no decompress failure.

Debugging it further, I suspect that this has to do with the header table size, and how the state is shared between client and server. Analyzing the settings stream from both nghttp and chrome shows different parameters being shared:

# from Chrome
#=> server settings: {:stream=>0, :payload=>[[:settings_max_concurrent_streams, 100]]}
#=> chrome settings: {:flags=>[], :stream=>0, :payload=>[[:settings_max_concurrent_streams, 1000], [:settings_initial_window_size, 6291456], [:settings_header_table_size, 65536]]}

# from nghttp
#=> server settings: {:stream=>0, :payload=>[[:settings_max_concurrent_streams, 100]]}
#=> nghttp settings: {:flags=>[], :stream=>0, :payload=>[[:settings_max_concurrent_streams, 100], [:settings_initial_window_size, 65535]]}

The supplied application send a fair amount of "dynamic" headers (x-runtime, x-request-id, etag) which are always different, but are still stored in the compressor table (which categorizes such headers as :incremental). In fact, removing some before generating and sending the headers frame stops generating the error (the compressor table is fairly reduced in such a case).

I've also tried to send the settings_header_table_size, 4096 explicitly to chrome, but this didn't help fix the issue. Do you have any idea, how could I debug this further?

HoneyryderChuck commented 7 years ago

@igrigorik do you have some input?

igrigorik commented 7 years ago

Sounds like an implementation bug on our end.

Sounds like you're able to reproduce this locally.. Can you try grabbing the chrome://net-internals trace for this session? My workflow is:

HoneyryderChuck commented 7 years ago

thx for the tip. Will make sure to allocate some time tomorrow for this.

HoneyryderChuck commented 7 years ago

@igrigorik here is a pull of the log for the first failed request:

                  --> priority = "MEDIUM"
                  --> url = "https://localhost:9292/assets/bootstrap/popover.js?body=1"
t=57785 [st=  0]    URL_REQUEST_DELEGATE  [dt=0]
t=57785 [st=  0]   +URL_REQUEST_START_JOB  [dt=927]
                    --> load_flags = 256 (VERIFY_EV_CERT)
                    --> method = "GET"
                    --> url = "https://localhost:9292/assets/bootstrap/popover.js?body=1"
t=57785 [st=  0]      URL_REQUEST_DELEGATE  [dt=0]
t=57785 [st=  0]      HTTP_CACHE_GET_BACKEND  [dt=0]
t=57785 [st=  0]      HTTP_CACHE_OPEN_ENTRY  [dt=0]
                      --> net_error = -2 (ERR_FAILED)
t=57785 [st=  0]      HTTP_CACHE_CREATE_ENTRY  [dt=0]
t=57785 [st=  0]      HTTP_CACHE_ADD_TO_ENTRY  [dt=0]
t=57785 [st=  0]     +HTTP_STREAM_REQUEST  [dt=7]
t=57785 [st=  0]        HTTP_STREAM_REQUEST_STARTED_JOB
                        --> source_dependency = 627007 (HTTP_STREAM_JOB)
t=57792 [st=  7]        HTTP_STREAM_REQUEST_BOUND_TO_JOB
                        --> source_dependency = 627007 (HTTP_STREAM_JOB)
t=57792 [st=  7]     -HTTP_STREAM_REQUEST
t=57792 [st=  7]     +HTTP_TRANSACTION_SEND_REQUEST  [dt=4]
t=57792 [st=  7]        HTTP_TRANSACTION_HTTP2_SEND_REQUEST_HEADERS
                        --> :authority: localhost:9292
                            :method: GET
                            :path: /assets/bootstrap/popover.js?body=1
                            :scheme: https
                            accept: */*
                            accept-encoding: gzip, deflate, sdch, br
                            accept-language: en-US,en;q=0.8
                            cookie: [352 bytes were stripped]
                            referer: https://localhost:9292/
                            user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36
t=57796 [st= 11]     -HTTP_TRANSACTION_SEND_REQUEST
t=57796 [st= 11]     +HTTP_TRANSACTION_READ_HEADERS  [dt=916]
t=58712 [st=927]        HTTP2_STREAM_ERROR
                        --> description = "ABANDONED (stream_id=29): https://localhost:9292/assets/bootstrap/popover.js?body=1"
                        --> status = -363
                        --> stream_id = 29
t=58712 [st=927]     -HTTP_TRANSACTION_READ_HEADERS
                      --> net_error = -363 (ERR_SPDY_COMPRESSION_ERROR)
t=58712 [st=927]   -URL_REQUEST_START_JOB
                    --> net_error = -363 (ERR_SPDY_COMPRESSION_ERROR)
t=58712 [st=927]    URL_REQUEST_DELEGATE  [dt=0]
t=58712 [st=927] -REQUEST_ALIVE
                  --> net_error = -363 (ERR_SPDY_COMPRESSION_ERROR)
igrigorik commented 7 years ago

@HoneyryderChuck thanks! Couple followup questions:

May or may not be relevant: https://bugs.eclipse.org/bugs/show_bug.cgi?id=471818

HoneyryderChuck commented 7 years ago

@igrigorik in short:

I think, it's definitely table size setting. I've been debugging with a local version of http-2, and I've seen this consistently failing when the table size reaches a threshold which equals the local table size. This led me to make some experiments, so I hope you'll help me validate them. First, the documentation about the header table size in the settings frame:

SETTINGS_HEADER_TABLE_SIZE (0x1): Allows the sender to inform the remote endpoint of the maximum size of the header compression table used to decode header blocks, in octets. The encoder can select any size equal to or less than this value by using signaling specific to the header compression format inside a header block (see [COMPRESSION]). The initial value is 4,096 octets.

Then I looked here: https://github.com/igrigorik/http-2/blob/master/lib/http/2/connection.rb#L557-L564

I've tried switching those statements (local: set decompressor table; remote: set compressor table), and it's now working on all cases (and specs pass, which makes me think that this specific case isn't specced). Here's a "snapshot" from the settings frames in fail and success case:

# fail case
sent settings: {:type=>:settings, :stream=>0, :payload=>[[:settings_max_concurrent_streams, 100]]}
received settings: {:length=>18, :type=>:settings, :flags=>[], :stream=>0, :payload=>[[:settings_header_table_size, 65536], [:settings_max_concurrent_streams, 1000], [:settings_initial_window_size, 6291456]]}
setting compressor table size: 65536
sent settings: {:type=>:settings, :stream=>0, :payload=>[], :flags=>[:ack]}
received settings: {:length=>0, :type=>:settings, :flags=>[:ack], :stream=>0, :payload=>[]}

# success case
sent settings: {:type=>:settings, :stream=>0, :payload=>[[:settings_max_concurrent_streams, 100]]}
received settings: {:length=>18, :type=>:settings, :flags=>[], :stream=>0, :payload=>[[:settings_header_table_size, 65536], [:settings_max_concurrent_streams, 1000], [:settings_initial_window_size, 6291456]]}
setting decompressor table size: 65536
sent settings: {:type=>:settings, :stream=>0, :payload=>[], :flags=>[:ack]}
received settings: {:length=>0, :type=>:settings, :flags=>[:ack], :stream=>0, :payload=>[]}

the success case is with my switch. I think the problem lies in deciding whether the side is local or remote: if you see the frames log above, the table size is set when receiving a settings ack from the browser, but the code suggests that it will take it as a local setting.

I'll submit a PR and maybe we can continue the discussion there.

igrigorik commented 7 years ago

Thanks for digging into this.

In other words, I think what you're suggesting here is correct.

HoneyryderChuck commented 7 years ago

sweet! for the record this was the last blocker to serve a rails app. glad to see it solved :)