openssl / project

Tracking of project related issues
2 stars 1 forks source link

SSL_write_ex on stream object sometimes retransmits parts of stream data #924

Closed nhorman closed 22 hours ago

nhorman commented 3 days ago

In attempting to build out QUIC server testing using the quic-interop-runner, I have written a demo server here: https://github.com/nhorman/openssl/blob/quic-server-interop/demos/guide/quic-hq-interop-server.c

When running several tests with openssl specified as a server, a number of them failed indicating:

File size of /tmp/download_4d63aram/tknvfrrbre doesn't match. Original: 3145729 bytes, downloaded: 3147207 bytes.

Some debugging in the server code seems to have confirmed that we in fact wrote 3145729 bytes through the SSL_write_ex call, but the client still received 3147207 bytes.

Some additional debugging, in which I modified the interop runner harness to write sequential data into the file to download (I filled the file with a series of sequential values followed by a newline) led to an identification of the differences in the file that was sent and the file that was received:

--- ./tknvfrrbre        2024-11-19 00:50:20.000000000 +0000
+++ ../../client/client-download/tknvfrrbre     2024-11-19 00:50:24.000000000 +0000
@@ -110078,6 +110078,12 @@
 110078
 110079
 110080
+15
+110076
+110077
+110078
+110079
+110080
 110081
 110082
 110083
@@ -128802,6 +128808,18 @@
 128802
 128803
 128804
+128808793
+128794
+128795
+128796
+128797
+128798
+128799
+128800
+128801
+128802
+128803
+128804
 128805
 128806
 128807
@@ -147527,6 +147545,17 @@
...

From the diff it appears that, periodically, chunks of data within the stream get sent a second time over the wire.

Need to investigate the root cause and fix it. The error might be in the server code itself, but it doesn't appear so to me. I think the issue is a race condition in the reactor consuming bytes from the stream map with updates to the stream map from the application, but I've not looked closely enough yet