Streaming request / response server is unfairly penalised?

ioquatix commented 3 years ago

Since I can't see all the internal details of the router, I'm unsure of my assessment. However, I'm seeing odd behaviour when deploying Falcon as a server from within Heroku and I wonder if it's related to https://github.com/heroku/vegur/blob/26cf07b6d7f12841e529cd2a9fc354a70927a6be/src/vegur_proxy.erl#L642-L647

In my case, I have an app deployed within Heroku, and I can log into the web1 dyno:

~ $ curl -i http://localhost:22816
HTTP/1.1 200 OK
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
x-download-options: noopen
x-permitted-cross-domain-policies: none
referrer-policy: strict-origin-when-cross-origin
content-type: text/html; charset=utf-8
vary: Accept, Origin
etag: W/"86f012d9ce08d33ca58526d7f502de8d"
cache-control: max-age=0, private, must-revalidate
x-request-id: ecd781d9-896a-4550-9a5c-e1922f2aa54e
x-runtime: 0.012189
vary: accept-encoding
transfer-encoding: chunked

... data follows ...

However, from the outside world:

> curl -i "http://helloweather-staging.herokuapp.com"
HTTP/1.1 200 OK
Server: Cowboy
Date: Mon, 23 Aug 2021 00:54:01 GMT
Connection: close
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Permitted-Cross-Domain-Policies: none
Referrer-Policy: strict-origin-when-cross-origin
Content-Type: text/html; charset=utf-8
Vary: Accept, Origin
Etag: W/"7151c07748c01426badcbf6f1be61558"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: 46995793-019d-4ddd-bc93-03a5065db78c
X-Runtime: 0.011992
Vary: accept-encoding
Via: 1.1 vegur

Not that Connection: close has been added, seemingly by the router, and it's missing transfer-encoding: chunked.

If falcon can unambiguously determine what to do before reading the entire input body, it will do so. However, this normally requires reading all the headers at least.

ioquatix commented 3 years ago

I wanted to narrow this down, so I've created a very simple rails app to see if I could reproduce the issue:

> curl -i https://rails-falcon-heroku.herokuapp.com
HTTP/1.1 200 OK
Server: Cowboy
Date: Mon, 23 Aug 2021 09:04:00 GMT
Connection: close
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Permitted-Cross-Domain-Policies: none
Referrer-Policy: strict-origin-when-cross-origin
Content-Type: text/plain; charset=utf-8
Vary: Accept
Etag: W/"185f8db32271fe25f561a6fc938b2e26"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: d4089e57-bf0c-4158-970d-dddfc2e04c6e
X-Runtime: 0.001643
Vary: accept-encoding
Via: 1.1 vegur

Hello⏎

Certainly with the most basic app, it still seems a problem. That being said, I have non-Rails apps that seem to be okay, so it might be something to do with Rails?

ioquatix commented 3 years ago

I've enabled verbose logging on my test server above:

2021-08-23T09:12:13.202925+00:00 app[web.1]: {
    "time": "2021-08-23T09:12:13+00:00",
    "severity": "info",
    "class": "Async::HTTP::Protocol::HTTP1::Request",
    "oid": 6980,
    "pid": 11,
    "subject": "http://rails-falcon-heroku.herokuapp.com: GET / HTTP/1.1",
    "message": "Responding with: 200{"x-frame-options"=>["SAMEORIGIN"],
        "x-xss-protection"=>["1; mode=block"],
        "x-content-type-options"=>["nosniff"],
        "x-download-options"=>["noopen"],
        "x-permitted-cross-domain-policies"=>["none"],
        "referrer-policy"=>["strict-origin-when-cross-origin"],
        "content-type"=>"text/plain; charset=utf-8",
        "vary"=>["accept", "accept-encoding"],
        "etag"=>"W/\"185f8db32271fe25f561a6fc938b2e26\"",
        "cache-control"=>["max-age=0", "private", "must-revalidate"],
        "x-request-id"=>["5b27910e-daff-46d3-876c-137847bcf24c"],
        "x-runtime"=>["0.001773"]};
        #<Falcon::Adapters::Output length=nil body=Rack::BodyProxy> | #<Async::HTTP::Body::Statistics sent 5 bytes; took 2.39ms in total; took 2.37ms until first chunk>"
}

I cannot see anything strange about this response - the response body should be chunked since the length is unknown.

ioquatix commented 3 years ago

Okay, I see something odd:

Locally, using curl:

> curl -v -i http://rails-falcon-heroku.herokuapp.com
*   Trying 52.5.82.174:80...
* Connected to rails-falcon-heroku.herokuapp.com (52.5.82.174) port 80 (#0)
> GET / HTTP/1.1
> Host: rails-falcon-heroku.herokuapp.com
> User-Agent: curl/7.77.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Server: Cowboy
Server: Cowboy
< Date: Mon, 23 Aug 2021 10:00:30 GMT
Date: Mon, 23 Aug 2021 10:00:30 GMT
< Connection: close
Connection: close
< X-Frame-Options: SAMEORIGIN
X-Frame-Options: SAMEORIGIN
< X-Xss-Protection: 1; mode=block
X-Xss-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< X-Download-Options: noopen
X-Download-Options: noopen
< X-Permitted-Cross-Domain-Policies: none
X-Permitted-Cross-Domain-Policies: none
< Referrer-Policy: strict-origin-when-cross-origin
Referrer-Policy: strict-origin-when-cross-origin
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8
< Vary: Accept
Vary: Accept
< Etag: W/"185f8db32271fe25f561a6fc938b2e26"
Etag: W/"185f8db32271fe25f561a6fc938b2e26"
< Cache-Control: max-age=0, private, must-revalidate
Cache-Control: max-age=0, private, must-revalidate
< X-Request-Id: 0d510089-e556-4486-9e99-844b6b3a1a93
X-Request-Id: 0d510089-e556-4486-9e99-844b6b3a1a93
< X-Runtime: 0.001960
X-Runtime: 0.001960
< Vary: accept-encoding
Vary: accept-encoding
< Via: 1.1 vegur
Via: 1.1 vegur

< 
* Closing connection 0
Hello

But the debug log on my server (what Falcon receives):

2021-08-23T09:57:14.709018+00:00 app[web.1]: {"time":"2021-08-23T09:57:14+00:00","severity":"info","class":"Async::HTTP::Protocol::HTTP1::Request","oid":6820,"pid":10,"subject":"http://rails-falcon-heroku.herokuapp.com: GET / HTTP/1.1","message":"Headers: {\"connection\"=>[\"close\"], \"user-agent\"=>\"curl/7.77.0\", \"accept\"=>[\"*/*\"], \"x-request-id\"=>[\"875e873d-8bfc-443d-b945-f6782eab0f48\"], \"x-forwarded-for\"=>[\"151.210.172.1\"], \"x-forwarded-proto\"=>[\"https\"], \"x-forwarded-port\"=>[\"443\"], \"via\"=>[\"1.1 vegur\"], \"connect-time\"=>[\"0\"], \"x-request-start\"=>[\"1629712634706\"], \"total-route-time\"=>[\"0\"]} from #<Addrinfo: 10.1.24.52:15634 TCP>"}

Specifically, it seems like something between curl and falcon is adding connection: close.

ioquatix commented 3 years ago

Even when I explicitly set Connection: keep-alive it's ignored:

> curl -v -H "Connection: keep-alive" -H "Foo: bar" -i http://rails-falcon-heroku.herokuapp.com
...
> GET / HTTP/1.1
> Host: rails-falcon-heroku.herokuapp.com
> User-Agent: curl/7.77.0
> Accept: */*
> Connection: keep-alive
> Foo: bar

On the server:

"Headers: {\"connection\"=>[\"close\"], \"user-agent\"=>\"curl/7.77.0\", \"accept\"=>[\"*/*\"], \"foo\"=>[\"bar\"], \"x-request-id\"=>[\"92e0d87c-eb5c-4837-8dde-f1bdd39da387\"], \"x-forwarded-for\"=>[\"151.210.172.1\"], \"x-forwarded-proto\"=>[\"http\"], \"x-forwarded-port\"=>[\"8

ioquatix commented 3 years ago

Okay, I have narrowed this down:

Here is my rack application (running in Falcon):

run lambda{|env|
    case env['PATH_INFO']
    when '/array'
        [200, [], ["Hello World"]]
    when '/enumeration'
    [200, [], ["Hello World"].to_enum]
    else
        [404, [], []]
    end
}

The /array response will be a fixed content-length while the /enumeration response will use transfer-encoding: chunked.

`/array`

Client side:

> curl -i https://rails-falcon-heroku.herokuapp.com/array
HTTP/1.1 200 OK
Server: Cowboy
Date: Mon, 23 Aug 2021 20:58:11 GMT
Connection: keep-alive
Vary: accept-encoding
Content-Length: 11
Via: 1.1 vegur

Hello World⏎

Server side:

2021-08-23T20:58:12.067556+00:00 heroku[router]: at=info method=GET path="/array" host=rails-falcon-heroku.herokuapp.com request_id=324f1a6d-2d98-4281-9468-0245cbc9b233 fwd="151.210.172.1" dyno=web.1 connect=0ms service=1ms status=200 bytes=92 protocol=https
2021-08-23T20:58:12.070521+00:00 app[web.1]: {"time":"2021-08-23T20:58:12+00:00","severity":"warn","class":"Async::HTTP::Protocol::HTTP1::Server","oid":2340,"pid":12,"subject":"#<Async::HTTP::Protocol::HTTP1::Server:0x000056318fb3b700>","arguments":[[["Host","rails-falcon-heroku.herokuapp.com"],["Connection","close"],["User-Agent","curl/7.77.0"],["Accept","*/*"],["X-Request-Id","324f1a6d-2d98-4281-9468-0245cbc9b233"],["X-Forwarded-For","151.210.172.1"],["X-Forwarded-Proto","https"],["X-Forwarded-Port","443"],["Via","1.1 vegur"],["Connect-Time","0"],["X-Request-Start","1629752292064"],["Total-Route-Time","0"]]]}
2021-08-23T20:58:12.070694+00:00 app[web.1]: {"time":"2021-08-23T20:58:12+00:00","severity":"info","class":"Async::HTTP::Protocol::HTTP1::Request","oid":2360,"pid":12,"subject":"http://rails-falcon-heroku.herokuapp.com: GET /array HTTP/1.1","message":"Headers: {\"connection\"=>[\"close\"], \"user-agent\"=>\"curl/7.77.0\", \"accept\"=>[\"*/*\"], \"x-request-id\"=>[\"324f1a6d-2d98-4281-9468-0245cbc9b233\"], \"x-forwarded-for\"=>[\"151.210.172.1\"], \"x-forwarded-proto\"=>[\"https\"], \"x-forwarded-port\"=>[\"443\"], \"via\"=>[\"1.1 vegur\"], \"connect-time\"=>[\"0\"], \"x-request-start\"=>[\"1629752292064\"], \"total-route-time\"=>[\"0\"]} from #<Addrinfo: 10.1.29.148:21276 TCP>"}
2021-08-23T20:58:12.071012+00:00 app[web.1]: {"time":"2021-08-23T20:58:12+00:00","severity":"info","class":"Async::HTTP::Protocol::HTTP1::Request","oid":2360,"pid":12,"subject":"http://rails-falcon-heroku.herokuapp.com: GET /array HTTP/1.1","message":"Responding with: 200 {\"vary\"=>[\"accept-encoding\"]}; #<Falcon::Adapters::Output length=11 body=Array> | #<Async::HTTP::Body::Statistics sent 11 bytes; took 0.19ms in total; took 0.18ms until first chunk>"}

Summary: Client -> Router (keep-alive) -> Falcon (close) -> content-length: 11 -> Router -> Client (keep-alive).

`/enumeration`

Client side:

> curl -i https://rails-falcon-heroku.herokuapp.com/enumeration
HTTP/1.1 200 OK
Server: Cowboy
Date: Mon, 23 Aug 2021 21:00:52 GMT
Connection: close
Vary: accept-encoding
Via: 1.1 vegur

Hello World⏎

Server side:

2021-08-23T21:00:53.044351+00:00 app[web.1]: {"time":"2021-08-23T21:00:53+00:00","severity":"warn","class":"Async::HTTP::Protocol::HTTP1::Server","oid":2340,"pid":13,"subject":"#<Async::HTTP::Protocol::HTTP1::Server:0x000056318fb4c618>","arguments":[[["Host","rails-falcon-heroku.herokuapp.com"],["Connection","close"],["User-Agent","curl/7.77.0"],["Accept","*/*"],["X-Request-Id","ca34386f-9153-4692-a148-bebd2f3ddc83"],["X-Forwarded-For","151.210.172.1"],["X-Forwarded-Proto","https"],["X-Forwarded-Port","443"],["Via","1.1 vegur"],["Connect-Time","0"],["X-Request-Start","1629752453042"],["Total-Route-Time","0"]]]}
2021-08-23T21:00:53.044528+00:00 app[web.1]: {"time":"2021-08-23T21:00:53+00:00","severity":"info","class":"Async::HTTP::Protocol::HTTP1::Request","oid":2360,"pid":13,"subject":"http://rails-falcon-heroku.herokuapp.com: GET /enumeration HTTP/1.1","message":"Headers: {\"connection\"=>[\"close\"], \"user-agent\"=>\"curl/7.77.0\", \"accept\"=>[\"*/*\"], \"x-request-id\"=>[\"ca34386f-9153-4692-a148-bebd2f3ddc83\"], \"x-forwarded-for\"=>[\"151.210.172.1\"], \"x-forwarded-proto\"=>[\"https\"], \"x-forwarded-port\"=>[\"443\"], \"via\"=>[\"1.1 vegur\"], \"connect-time\"=>[\"0\"], \"x-request-start\"=>[\"1629752453042\"], \"total-route-time\"=>[\"0\"]} from #<Addrinfo: 10.1.1.183:24405 TCP>"}
2021-08-23T21:00:53.044906+00:00 app[web.1]: {"time":"2021-08-23T21:00:53+00:00","severity":"info","class":"Async::HTTP::Protocol::HTTP1::Request","oid":2360,"pid":13,"subject":"http://rails-falcon-heroku.herokuapp.com: GET /enumeration HTTP/1.1","message":"Responding with: 200 {\"vary\"=>[\"accept-encoding\"]}; #<Falcon::Adapters::Output length=nil body=Enumerator> | #<Async::HTTP::Body::Statistics sent 11 bytes; took 0.23ms in total; took 0.2ms until first chunk>"}
2021-08-23T21:00:53.045036+00:00 heroku[router]: at=info method=GET path="/enumeration" host=rails-falcon-heroku.herokuapp.com request_id=ca34386f-9153-4692-a148-bebd2f3ddc83 fwd="151.210.172.1" dyno=web.1 connect=0ms service=1ms status=200 bytes=72 protocol=https

Summary: Client -> Router (close) -> Falcon (close) -> content-length: 11 -> Router -> Client (close).

Conclusion

It seems like the Heroku router has two distinct problems:

Always `connection: close` to application server.

My expectation is the router should establish (assuming only HTTP/1.1) several persistent connections to the application server.

The cost here is the per-connection overhead. Not sure how much this is but it can contribute to overall latency. It's also unexpected.

Unable to handle `transfer-encoding: chunked` from application server.

My expectation is that chunked transfer encoding should not result in a connection: close response. The reason why this would happen is because transfer-encoding: chunked does not carry the content length, and thus somewhere it must be opting to convert this to a unknown length response followed by a close, rather than using a persistent connection.

This can have a significant impact on client side performance since the overall latency can be significantly dominated by connection overhead.

Not only that, but it appears there are some major performance regressions when taking this path.

> ab -k -n 10 https://rails-falcon-heroku.herokuapp.com/array
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking rails-falcon-heroku.herokuapp.com (be patient).....done

Server Software:        Cowboy
Server Hostname:        rails-falcon-heroku.herokuapp.com
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
Server Temp Key:        ECDH P-256 256 bits
TLS Server Name:        rails-falcon-heroku.herokuapp.com

Document Path:          /array
Document Length:        11 bytes

Concurrency Level:      1
Time taken for tests:   2.842 seconds
Complete requests:      10
Failed requests:        0
Keep-Alive requests:    10
Total transferred:      1660 bytes
HTML transferred:       110 bytes
Requests per second:    3.52 [#/sec] (mean)
Time per request:       284.240 [ms] (mean)
Time per request:       284.240 [ms] (mean, across all concurrent requests)
Transfer rate:          0.57 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   64 203.8      0     645
Processing:   214  220  15.0    215     262
Waiting:      214  220  15.0    215     262
Total:        214  284 203.1    215     861

Percentage of the requests served within a certain time (ms)
  50%    215
  66%    215
  75%    218
  80%    262
  90%    861
  95%    861
  98%    861
  99%    861
 100%    861 (longest request)

vs

> ab -k -n 10 https://rails-falcon-heroku.herokuapp.com/enumeration
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking rails-falcon-heroku.herokuapp.com (be patient).....done

Server Software:        Cowboy
Server Hostname:        rails-falcon-heroku.herokuapp.com
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
Server Temp Key:        ECDH P-256 256 bits
TLS Server Name:        rails-falcon-heroku.herokuapp.com

Document Path:          /enumeration
Document Length:        11 bytes

Concurrency Level:      1
Time taken for tests:   12.336 seconds
Complete requests:      10
Failed requests:        0
Keep-Alive requests:    0
Total transferred:      1410 bytes
HTML transferred:       110 bytes
Requests per second:    0.81 [#/sec] (mean)
Time per request:       1233.579 [ms] (mean)
Time per request:       1233.579 [ms] (mean, across all concurrent requests)
Transfer rate:          0.11 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      642  951 343.8    921    1874
Processing:   212  282  63.0    307     400
Waiting:      212  282  63.1    307     400
Total:        927 1233 340.1   1228    2089

Percentage of the requests served within a certain time (ms)
  50%   1228
  66%   1229
  75%   1261
  80%   1398
  90%   2089
  95%   2089
  98%   2089
  99%   2089
 100%   2089 (longest request)

transfer-encoding: chunked seems to be ~6x the latency.

ioquatix commented 3 years ago

I investigated the connection: close latency in more detail, of course it seems https adds a large overhead - using http gives a slightly better picture with only 2x the overhead (i.e. reconnecting every request?).

> ab -k -n 10 http://rails-falcon-heroku.herokuapp.com/enumeration
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking rails-falcon-heroku.herokuapp.com (be patient).....done

Server Software:        Cowboy
Server Hostname:        rails-falcon-heroku.herokuapp.com
Server Port:            80

Document Path:          /enumeration
Document Length:        11 bytes

Concurrency Level:      1
Time taken for tests:   4.248 seconds
Complete requests:      10
Failed requests:        0
Keep-Alive requests:    0
Total transferred:      1410 bytes
HTML transferred:       110 bytes
Requests per second:    2.35 [#/sec] (mean)
Time per request:       424.788 [ms] (mean)
Time per request:       424.788 [ms] (mean, across all concurrent requests)
Transfer rate:          0.32 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      209  211   1.6    211     214
Processing:   211  214   3.5    213     223
Waiting:      211  213   3.5    213     223
Total:        420  425   4.1    424     434

Percentage of the requests served within a certain time (ms)
  50%    424
  66%    424
  75%    426
  80%    430
  90%    434
  95%    434
  98%    434
  99%    434
 100%    434 (longest request)

ioquatix commented 3 years ago

I was interested to look at another example where I use streaming with transfer-encoding: chunked:

> curl --no-buffer -i https://utopia-falcon-heroku.herokuapp.com/beer/index
HTTP/1.1 200 OK
Server: Cowboy
Date: Mon, 23 Aug 2021 22:55:30 GMT
Connection: close
Content-Type: text/html; charset=utf-8
Vary: accept-encoding
Via: 1.1 vegur

<!DOCTYPE html><html><head><title>99 Bottles of Beer</title></head><body>... snip ...

It seems like Heroku's router is systematically transforming transfer-encoding: chunked to connection: close without content-length. This seems like an unfortunate implementation choice. Streaming does work, but it's incredibly basic (HTTP/1.0 level).

ferd commented 3 years ago

I haven't worked at Heroku in half a decade or so but if I recall:

I believe the router always closed connections to the backend because that's what it used to do on older legacy stacks
There are no pre-warming of connections because that would require coordination across routers and some apps (i.e. legacy Rails) can only hold 1 connection per dyno
I do believe there are tests around keepalive behaviour with chunked connections (https://github.com/heroku/vegur/blob/master/test/vegur_roundtrip_SUITE.erl#L59-L65), and some specifically for the keepalive behaviour (https://github.com/heroku/vegur/blob/master/test/vegur_roundtrip_SUITE.erl#L59-L2644)
There explicit code handling the chunked -> 1.0 handling specifically for 1.0 servers or clients, but not more
Back in the day the thing I'd have asked for to debug this is a pcap dump to see what happens.

There might be a weird racy case though. IIRC there could be weird things happening if an endpoint sent a full response and closed the connection before we even were done streaming the request, but I don't know if this is what is happening here.

ioquatix commented 3 years ago

@ferd thanks for chiming in on this project given your circumstance.

I believe the router always closed connections to the backend because that's what it used to do on older legacy stacks

Yes, I agree that I'm observing this behaviour. Which is unfortunate since there is definitely a performance cost here (although probably only a few ms on average).

There explicit code handling the chunked -> 1.0 handling specifically for 1.0 servers or clients, but not more

It basically looks like the router only supports HTTP/1.0 semantics, even if the client supports HTTP/1.1. Or maybe because the router sets connection: close on the application server request, the application server is responding with non-transfer-encoding body - I need to check this. Certainly, if you are going to close the connection anyway, there is no point using transfer-encoding: chunked. Maybe in this case, the router is just sending the response verbatim including connection: close. I can check this.

ferd commented 3 years ago

Yeah your latter supposition might be it.

I recall some servers doing this explicitly with the idea that they might be saving a few bytes (even if they wouldn't amount to savings on a grown packet), and I recall the golang server also doing the opposite and always using a chunked connection with an instant \r\n to save a few bytes from passing connection headers.

ioquatix commented 3 years ago

Yes, it seems to be the case:

                elsif @persistent and version == HTTP11
                    write_connection_header(version)
                    # We specifically ensure that non-persistent connections do not use chunked response, so that hijacking works as expected.
                    write_chunked_body(body, head, trailer)
                else
                    @persistent = false
                    write_connection_header(version)
                    write_body_and_close(body, head)
                end

I can hack this to always return transfer-encoding: chunked to see if that impacts the front end response.

ioquatix commented 3 years ago

Okay, I hacked this to always return transfer-encoding: chunked:

module AlwaysUseTransferEncodingChunked
    def write_body(version, body, head = false, trailer = nil)
        if body.nil?
            write_connection_header(version)
            write_empty_body(body)
        elsif length = body.length and trailer.nil?
            write_connection_header(version)
            write_fixed_length_body(body, length, head)
        elsif body.empty?
            # Even thought this code is the same as the first clause `body.nil?`, HEAD responses have an empty body but still carry a content length. `write_fixed_length_body` takes care of this appropriately.
            write_connection_header(version)
            write_empty_body(body)
        elsif version == ::Protocol::HTTP1::Connection::HTTP11
            write_connection_header(version)
            # We specifically ensure that non-persistent connections do not use chunked response, so that hijacking works as expected.
            write_chunked_body(body, head, trailer)

            if !@persistent
                @stream.close_write
            end
        else
            @persistent = false
            write_connection_header(version)
            write_body_and_close(body, head)
        end
    end
end

::Protocol::HTTP1::Connection.prepend(AlwaysUseTransferEncodingChunked)

...and it worked:

> curl -i https://rails-falcon-heroku.herokuapp.com/enumeration
HTTP/1.1 200 OK
Server: Cowboy
Date: Tue, 24 Aug 2021 00:16:21 GMT
Connection: keep-alive
Vary: accept-encoding
Transfer-Encoding: chunked
Via: 1.1 vegur

Hello World⏎

I need to spend some more time thinking about whether this is the right approach or not. However, this is still unexpected behaviour from the router. The router is always specify connection: close and it's reasonable to expect the server will not use transfer-encoding: chunked in this case. It causes the performance regression if the router does not transform this response into transfer-encoding: chunked. The router should do everything it possibly can to avoid connection: close and/or ensure connection: keep-alive because it's such a huge impact to performance.

Also, there are some cases where this hack might not be the right behaviour... I don't believe the server is doing the wrong thing here, and I think the router is (significantly) sub-optimal by the design constraints/choices. Regarding the performance on the server side, such an implementation would prevent the use of splice(/sendfile) because transfer-encoding: chunked is much more involved in the server side formatting of data.

ferd commented 3 years ago

My expectation back in the day is that the server should still use the chunked connection encoding. Close-delimited connections are unsafe because they contain no mechanism to know if the data transfer was completed due to an early drop in the connection or because it was truly done (some stacks make it unclear whether you get a FIN or RST packet, and so there is no in-band semantic way to know).

If you're using a content-length, the end length is known ahead of time and you know when the message is done, and with a chunked encoding you get a final delimiter that lets you know for sure the message was properly terminated as well. These mean that you should ideally avoid using close-delimited transfers as any sort of optimization because the error-adjacent behaviour is vague and therefore riskier.

The connection:close vs. connection:keepalive discussion is a good one to have. Maybe things would change today for people running Heroku, but what things were like 5-7 years ago is that, well, there were simply a lot of applications that wouldn't or couldn't handle pipelined requests right, and which otherwise expected the router to load-balance few clients sending lots of requests to various backends.

For example, if they closed the connection to the proxy, they wanted the proxy to keep forwarding the other enqueued requests to other backends, and dropping that meant breaking backwards-compatibility.

The behaviour of the proxy was mostly defined by a desire not to break even older behavior that pre-dated this generation; doing things to be more performant implies going through deprecation cycles with thousands of customers who may or may not be technical enough to handle this.

If I recall properly though, there might have been feature flags allowing keepalive connections to the dynos. I implemented code to support it so I have to figure the flag may or may not still be around in the private router product.

Contacting support could help in possibly turning that switch on for your app.

ioquatix commented 3 years ago

You are totally right, even if we don't need it, using transfer-encoding: chunked is better for reliability (but slightly worse for performance). In think we can opt for this implementation in Falcon which will mitigate the issue we are experiencing here. It only applies to a small subset of cases (when no content-length is present).

That being said, I think there are still actionable items here w.r.t. Heroku's router.

ioquatix commented 3 years ago

Okay, I implemented this https://github.com/socketry/protocol-http1/commit/cb8da422734d04313da8c55be113aa51a843915e

I will test it later to see if this works as expected.

ioquatix commented 3 years ago

Okay, I've confirmed the issue is fixed:

> curl -i http://rails-falcon-heroku.herokuapp.com/array
HTTP/1.1 200 OK
Server: Cowboy
Date: Tue, 24 Aug 2021 04:23:18 GMT
Connection: keep-alive
Vary: accept-encoding
Content-Length: 11
Via: 1.1 vegur

Hello World⏎

> curl -i http://rails-falcon-heroku.herokuapp.com/enumeration
HTTP/1.1 200 OK
Server: Cowboy
Date: Tue, 24 Aug 2021 04:23:24 GMT
Connection: keep-alive
Vary: accept-encoding
Transfer-Encoding: chunked
Via: 1.1 vegur

Hello World⏎

Because I'm satisfied with this fix, I will close the issue.

heroku / vegur