More fine-grained connection states

rmarx commented 4 years ago

Per @huitema:

I would split the active state between "start" and "confirmed", and the handshake start on the server side between "received a request" and "address validated".

The server will go through the "anti dos mitigation" phase until the client's address is validated. That's important, because the server behavior in that state is restricted. Once it has sent all the handshake packets, the server goes to a "false start" phase in which it can send 1-RTT packets but (should) not receive. And then once it receives the client finished and sends the "handshake done", it moves to a confirmed state, at which point it can deal with migration and key update.

Client is kind of the same. It goes from initiating to handshake, then to an "almost ready" phase after sending the "client finished" and getting the 1rtt keys. But it will only become really active once it receives the "handshake done" or an equivalent.

huitema commented 4 years ago

Defining connection states is hard, but here is what I would suggest:

1) Client side, up to the "ready" state:

attempted (Initial sent, no handshake keys received yet)
handshake (handshake keys received)
almost ready (1RTT keys received, but handshake done not received yet)
ready (or active) (handshake done received from server, or equivalent) 2) Server side:
received (initial received)
validated (client address has been verified)
handshake (handshake packets received from client, handshake in progress)
false start (1RTT keys write received, but handshake not complete yet)
ready (handshake complete, handshake done sent) 3) Both sides:
draining (close connection packet sent or received, waiting for some time)
disconnected (done with this connection)

marten-seemann commented 4 years ago

I like the addition of validated. That can be really useful when debugging the early handshake stages.

almost ready (1RTT keys received, but handshake done not received yet)

ready (or active) (handshake done received from server, or equivalent)

This could be handshake_completed and handshake_confimred. Both of them would work for client as well as the server.

false start (1RTT keys write received, but handshake not complete yet)

Not sure if we need this. We already have a key_updated event.

marten-seemann commented 4 years ago

Here's my proposal:

enum ConnectionState {
    validated, // only for the server, when the client's IP has been validated
    handshake_completed, // TLS handshake successful
    handshake_confirmed, // handshake confirmed, see sec. 4.1.2 of the QUIC-TLS draft
    draining, // CONNECTION_CLOSE sent
    closed // connection actually fully closed, memory freed
}

@rmarx, @huitema What do you think?

huitema commented 4 years ago

@marten-seemann I think that's too coarse. You have to consider scenarios in which Initial packets are exchanged for some time before handshake keys are available, e.g., for post quantum, and scenarios in which handshake packets are exchanged for some time before 1RTT keys are available, e.g., client auth. Also, per 4.1.2, handshake confirmed on the server happens at exactly the same time as handshake confirmed. That's why I use a "false start" state for the server.

We are discussing logging options here. Having detailed logging options does not hurt, there is no point in being too parsimonious. If you only want to log a subset of them, that's a fine implementation choice, but that should not prevent precise logging for those who want it.

marten-seemann commented 4 years ago

Also, per 4.1.2, handshake confirmed on the server happens at exactly the same time as handshake confirmed.

True. This event would be kind of redundant for the server. It's also kind of redundant redundant for the client, since this is the time when the client drops Handshake keys.

@marten-seemann I think that's too coarse. You have to consider scenarios in which Initial packets are exchanged for some time before handshake keys are available, e.g., for post quantum, and scenarios in which handshake packets are exchanged for some time before 1RTT keys are available, e.g., client auth. [...] That's why I use a "false start" state for the server.

First of all, I don't understand the name "false start". My point in https://github.com/quiclog/internet-drafts/issues/49#issuecomment-612604685 was that we already have an event for this: it's the key updated event.

Now it seems like both "false start", handshake_comfirmed and handshake_completed are all redundant, since they all accompanied by key generation / key discarding events. Not sure what to make of that...

rmarx commented 4 years ago

So, I've been trying to build a QUIC handshake/connection state machine to see which states we actually need. This initial design is garnered towards expressing the -entire- state machine with all state transitions. While I agree with @marten-seemann above that some of these can be deduced from e.g., key_update events, for now I'd like to keep this event as complete as possible, so it can be used without those other events as well.

I believe that we need 8 states in total (mainly to allow for some phases to last longer than 1 RTT/1 flight):

attempted (initial sent/received)
peer_validated (peer address validated by: client sent Handshake packet OR client used CONNID chosen by the server)
handshake_started
false_start (1 RTT can be sent, but handshake isn't done yet)
handshake_complete (TLS handshake complete: Finished received and sent)
handshake_confirmed (HANDSHAKE_DONE sent/received) (connection is now "active", 1RTT can be sent)
draining (connection_close sent/received)
closed (draining period done)

Following the handshake examples in the transport draft, this would lead to the following events and transitions:

Client:

send initial
- state = attempted
get initial
- state = validated (not really "needed" at the client, but somewhat useful to indicate progress nonetheless)
get first Handshake packet
- state = handshake_started
get Handshake packet containing ServerFinished
- state = handshake_complete
send ClientFinished
- state = false_start (1RTT can now be sent)
get HANDSHAKE_DONE
- state = handshake_confirmed

Server:

get initial
- state = attempted
send initial (don't think this needs a separate state, since some handshake will always be sent in the same flight as this?)
send handshake EE, CERT, CV, ...
- state = handshake_started
send ServerFinished
- state = false_start (1RTT can now be sent)
get first handshake packet / something using a server-issued CID of min length
- state = validated
get handshake packet containing ClientFinished
- state = handshake_complete
send HANDSHAKE_DONE
- state = handshake_confirmed

This has a few ugly things, of course:

The server validated and false-start events won't always be in the same order, depending on how long the handshake takes (e.g., with a large CERT, handshake might need several flights to complete, causing validated to come before false-start)
At the server side, handshake_complete and handshake_confirmed will always fire directly after one another (this is also true in the TLS draft's definitions however)
The false_start and validated aren't really needed at the client side (but do provide a nice parallel in logic to the server's progression imo)

Finally, for users not wishing to log all states this fine-grainedly, this condenses nicely to:

attempted
handshake_started
handshake_confirmed
closed

What do people think? Did I miss something?

huitema commented 4 years ago

From an implementation point of view, you probably want to separate "false_start" (on the server side) and "almost_ready" on the client side. The server side is more constrained, because it cannot receive 1-RTT packets. Also, when the client enters the "almost_ready" start, the TLS stack on the client has seen the server certificate and validated the "server finished" message. In contrast, the server will only know that there is no MITM or 0-RTT replay attack when its TLS stack validates and receives the "client finished" message, so in theory the server has to apply some caution before sending data.

martinthomson commented 4 years ago

Our implementation has "waiting for initial", which roughly corresponds to your "attempted", but also encapsulates the server side prior to receiving the initial. We don't keep that state very long server side; your description implies that it persists, but I don't think that is helpful.

Then we have "handshaking", which we keep orthogonal to the states where various keys are available. Keeping key availability separate is better than trying to enumerate the various availability options. false_start is a problem, I think in that it implies something else; an "early write" state is fine. In this state you can sometimes send, based on what keys are available, but you can never receive. Also, trying to conflate address validation state with handshake state will likely complicate things more than necessary.

Then we have "complete" where TLS is done. And "confirmed" where you have thrown out handshaking state.

We use the states from the spec for closing, which is not a single state, but three.

So aside from your closed state, the summary form is probably best. Logging both address validation and key availability independently would be good.

rmarx commented 4 years ago

Thank you both @huitema and @martinthomson for your input!

Changes I've made for -02:

renamed "false_start" to "early_write". I appreciate that this is potentially not entirely the correct term, but I was thinking mainly of non-protocol experts trying to following along and "early write" seemed easiest to grok. Open to other suggestions for -03.
changed to having 3 closing states: closing, draining, closed
added explicit guidance to support the condensed/simpler set

@martinthomson: we keep support for logging the key updates separately as well. WRT address validation, this will be re-evaluated once I add proper support for connection migration/path management. For now, I'm keeping it in there because @huitema suggested it above and it's the most logical place to put it at this point.

@martinthomson: I do not have anything before "attempted" because this event is connection scoped and IIUC servers don't have a connection concept before the first initial from the client arrives (similar for the client). In qlog, this is more generally reflected by other events like server_listening atm.

quicwg / qlog

More fine-grained connection states #49