ninenines / gun

HTTP/1.1, HTTP/2, Websocket client (and more) for Erlang/OTP.
ISC License
891 stars 232 forks source link

Don't delay errors when GOAWAY has been received #251

Closed zuiderkwast closed 3 years ago

zuiderkwast commented 3 years ago

Scenario: On a connection with constant high load, the server sends a GOAWAY frame. Then, before all the streams have completed, more requests are sent to the Gun process by the application.

Expected behaviour: The gun_error needs to be sent instantly, so that the application can retry the request on a different connection (typically to a different machine in a cloud setup of some sort).

Actual behaviour: The errors are not sent until all streams are completed. Additionally, two unexpected extra messages gun_down and gun_up are sent when the streams have completed.

The test case below illustrates the current actual behavior:

new_requests_fail_after_receiving_goaway(_) ->
    doc("Check that Gun sends gun_error immediately for requests queued "
        "after receiving a GOAWAY frame. (RFC7540 x.x.x)"),
    MainPid = self(),
    {ok, OriginPid, Port} = init_origin(tcp, http2, fun(_, Socket, Transport) ->
        %% Step 2. Receive a HEADERS frame.
        {ok, <<Len1:24, 1:8, _:8, 1:32>>} = Transport:recv(Socket, 9, 1000),
        {ok, _SkippedHeader} = gen_tcp:recv(Socket, Len1, 1000),

        %% Step 3. Send GOAWAY with last stream ID = 1.
        ok = Transport:send(Socket, cow_http2:goaway(1, no_error, <<>>)),

        %% Step 5. Receive GOAWAY response from Gun.
        {ok, <<Len2:24, 7:8, _:8, 0:32>>} = Transport:recv(Socket, 9, 1000),
        {ok, _SkippedPayload} = gen_tcp:recv(Socket, Len2, 1000),

        %% Wait for step 6...
        timer:sleep(600),

        %% Step 7. Send a response for stream 1.
        {HeadersBlock, _} = cow_hpack:encode([
            {<<":status">>, <<"200">>}
        ]),
        ok = Transport:send(Socket, [
            cow_http2:headers(1, nofin, HeadersBlock)
        ]),
        ok = Transport:send(Socket, [
            cow_http2:data(1, fin, <<"hello">>)
        ]),

        %% Check that Gun didn't send the second request.
        case Transport:recv(Socket, 9, 100) of
            {error, timeout} -> ok;
            {error, closed} -> ok
        end,
        timer:sleep(100),
        Transport:close(Socket)
    end),
    {ok, ConnPid} = gun:open("localhost", Port, #{
        protocols => [http2]
    }),
    {ok, http2} = gun:await_up(ConnPid),
    handshake_completed = receive_from(OriginPid),

    %% Step 1.
    StreamRef1 = gun:get(ConnPid, "/slow-req"),

    %% Step 4. Gun receives GOAWAY and responds with GOAWAY.
    timer:sleep(300),

    %% Step 6. The 2nd request is not sent to the server.
    StreamRef2 = gun:get(ConnPid, "/after-goaway"),

    %% Step 8.

        %% ---------------------------------------------------------------------
    %% Expected behaviour: Error for the 2nd request immediately, i.e.
    %% before receiving the response for the 1st request.
        %% ---------------------------------------------------------------------

        %% ---------------------------------------------------------------------
    %% Actual behaviour: We get response for the 1st request first.
    {gun_response, ConnPid, StreamRef1, nofin, 200, _} = receive M1 -> M1 end,
    {gun_data, ConnPid, StreamRef1, fin, <<"hello">>} = receive M2 -> M2 end,

    %% Then, the following two come in any order.
    receive {gun_error, ConnPid, StreamRef2, _closed} -> ok end,
    receive {gun_down, ConnPid, http2, closed, [StreamRef2]} -> ok end,

    %% The following two unexpected messages are also received.
    receive {gun_down, ConnPid, http2, normal, []} -> ok end,
    receive {gun_up, ConnPid, http2} -> ok end,
    %% ---------------------------------------------------------------------

    no_more_message = receive M3 -> M3 after 0 -> no_more_message end,
    gun:close(ConnPid).

This scenario is expected to be common whenever a remote server is being taken out of service.

Apart from getting the gun_error as soon as possible, before the other streams have completed, it would also be useful to be informed that the connection is going down, so that the application can avoid sending more requests to it, or perhaps it would be enough to indicate this in the Reason element of the gun_error tuple.

essen commented 3 years ago

The gun_up isn't unexpected because you let Gun reconnect automatically (it'll do that by default). The second gun_down is not normal though.

In general it's OK to send more requests, they should be postponed until Gun has reconnected. But there's probably lots of corner case issues.

zuiderkwast commented 3 years ago

Ah... So, disabling automatic reconnect will cause the errors to be returned immediately? I have to check....

essen commented 3 years ago

Not what I meant, this probably still is an issue. Just responding to the other points at the end of your report.

zuiderkwast commented 3 years ago

Right. Anyway, the important point for us is that we can retry the request as soon as possible on a different connection, so I guess we must use retry => 0. Then, with the same scenario, I only get response headers and body for the first request and then a {gun_down, ConnPid, http2, normal, []}. No error for the 2nd request is sent.

    %% Actual behaviour: We get response for the 1st request first.
    {gun_response, ConnPid, StreamRef1, nofin, 200, _} = receive M1 -> M1 end,
    {gun_data, ConnPid, StreamRef1, fin, <<"hello">>} = receive M2 -> M2 end,

    %% Then, a single gun_down message.
    {gun_down, ConnPid, http2, normal, []} = receive M3 -> M3 end,

    %% No error is sent for the 2nd request.
    no_more_message = receive M4 -> M4 after 0 -> no_more_message end,
essen commented 3 years ago

So I suppose in that case the problem is that the request is postponed despite the fact that it won't be handled, when retry is disabled we could produce errors instead of postponing.

zuiderkwast commented 3 years ago

Exactly. Here is a PR: #252.

essen commented 3 years ago

Merged, thanks!