tokio-rs / tls

A collection of Tokio based TLS libraries.
https://tokio.rs
MIT License
253 stars 86 forks source link

Fix early-data test #132

Closed divergentdave closed 1 year ago

divergentdave commented 1 year ago

The domain name this test uses was not updated when the test certificate was replaced previously. FWIW, this test isn't covered by CI, since it's behind an off-by-default feature flag.

quininer commented 1 year ago

Would you like to open it on the ci by the way?

divergentdave commented 1 year ago

Sure, I added another command to run that test.

quininer commented 1 year ago

The test seems to fail on windows, maybe there is a problem with openssl on windows ci machine. we can disable it on windows ci.

divergentdave commented 1 year ago

I ran a modified version of the test on both Linux and Windows, and it seems that this might be due to cross-platform differences in OpenSSL, but I can't rule out an issue in rustls yet. In both cases, I do see "hello" sent in application data. On Linux, the server sends two session tickets (nonce zero before the client's days and nonce one after the client's data), but I don't see any session tickets on Windows. In both cases, I see a client-to-server close notify alert, and I see a FIN from the client. However, I do not see a FIN from the server on Windows, only Linux. Even more suspicious, OpenSSL's key log is missing CLIENT_TRAFFIC_SECRET_0, compared to the key log file from rustls. The connection works enough to get data through in both directions, but there seems to be something wrong with higher-level connection state machines.

With a base set of changes, I see that openssl s_client stdout does not get as far on Windows as on Linux. I see only two lines printed, "Using default temp DH parameters", and "ACCEPT". Interestingly, if I additionally write a line to openssl s_server's stdin, I see that come out rustls's read end of the connection, and then I see a third line from openssl s_server's stdout, "helloNo early data received". (which is expected) The test still deadlocks waiting for the server to close its end of the TLS connection.

This suggests one side or the other may be forgetting to read from a pipe or socket when it should, or forgetting to wake a future appropriately. (I added a missing wake to a Pending control flow in Read1, and that was not sufficient, for what it's worth)

divergentdave commented 1 year ago

I have some changes in a branch at https://github.com/divergentdave/tokio-tls/tree/early-data-debugging, and I think the results strongly suggest an I/O handling bug in s_server on Windows. The most recent change I applied was writing to the s_server process's stdin repeatedly, with some delay. These writes (which ultimately cause the server to send data to the client) do get the connection unstuck multiple times, and the test eventually passes with them. OpenSSL's debug output, plus my modified test's output, shows the following.

$ cargo test -p tokio-rustls --features early-data --test early-data -- --nocapture
...
OpenSSL s_server stdout: "write to 0x22949068920 [0x229490c8500] (2176 bytes => 2176 (0x880))"
OpenSSL s_server stdout: "0000 - 16 03 03 00 7a 02 00 00-76 03 03 6c c3 98 ab f2   ....z...v..l...."
OpenSSL s_server stdout: "0010 - bd 30 bc 17 ec 9e 9a 3e-58 71 47 5a a6 6a 5b a0   .0.....>XqGZ.j[."
...
OpenSSL s_server stdout: "05a0 - b8 a8 31 b6 8e 24 9f ca-36 b4 9b 63 6a a3 22 8e   ..1..$..6..cj.\"."
OpenSSL s_server stdout: "05b0 - 90 e3 19 32 1f 0a c3 d2-ae a3 6f ff 3c ad 72 6b   ...2......o.<.rk"
Writing to OpenSSL s_server stdin
OpenSSL s_server stdout: "05c0 - 2a 38 23 02 b4 5f 28 36-91 63 20 14 be a6 f4 65   *8#.._(6.c ....e"
OpenSSL s_server stdout: "05d0 - 5f 77 10 4c f3 1e fb df-2b c3 ae a2 a9 5c 27 ab   _w.L....+....\\'."
...
OpenSSL s_server stdout: "0860 - f6 4c e2 9d 8d 52 37 17-58 21 6f 97 a2 5a 19 ee   .L...R7.X!o..Z.."
OpenSSL s_server stdout: "0870 - 15 b4 8c b8 33 43 23 cc-4c ed 1a aa 06 a5 f6 a1   ....3C#.L......."
OpenSSL s_server stdout: "No early data received"
...

Here, we see that OpenSSL intended to write 0x880 bytes, but only wrote about half of it at first. Immediately after the test wrote a newline to OpenSSL's stdin, it finished sending out this write of handshake data. Looking at the packet capture, the first packet sent after the first delay, corresponding to the first network event after the stdin write, is a new session ticket message sent from the server to the client. Thus, I think we can rule out misbehavior on the part of tokio-rustls (for example, failing to drain a TCP connection buffer) because the connection gets unstuck without any interaction with the Rust side (ignoring the stdin write itself). On the other hand, running openssl s_server -early_data -tls1_3 -cert ./tests/end.cert -key ./tests/end.rsa -port 12354 and openssl s_client -connect 127.0.0.1:12354 -sess_out session.pem together doesn't run into the same issue, so there must be some way Tokio or rustls affects this, even if it's just preconditions for triggering a bug.

quininer commented 1 year ago

Thank you for your meticulous research!