quictls / openssl

TLS/SSL and crypto library with QUIC APIs
https://quictls.github.io/openssl
Apache License 2.0
366 stars 50 forks source link

Excessive memory usage for QUIC connections #157

Open bwelling opened 4 months ago

bwelling commented 4 months ago

The problem that I’m seeing is that every QUIC connection is using far more memory than seemingly necessary. I narrowed a bunch of this down to this code in SSL_provide_quic_data:

    if (sc->quic_buf == NULL) {
        BUF_MEM *buf;
        if ((buf = BUF_MEM_new()) == NULL) {
            ERR_raise(ERR_LIB_SSL, ERR_R_INTERNAL_ERROR);
            return 0;
        }
        if (!BUF_MEM_grow(buf, SSL3_RT_MAX_PLAIN_LENGTH)) {
            ERR_raise(ERR_LIB_SSL, ERR_R_INTERNAL_ERROR);
            BUF_MEM_free(buf);
            return 0;
        }
        sc->quic_buf = buf;
        /* We preallocated storage, but there's still no *data*. */
        sc->quic_buf->length = 0;
        buf = NULL;
    }

Specifically, this code is preallocating a buffer to accumulate potentially fragmented TLS packets from QUIC packets, in order to parse the TLS packets. It’s passing a size of SSL3_RT_MAX_PLAIN_LENGTH, which is 16k, and for reasons that I don’t understand, the BUF code scales this by 4/3, causing an allocation of over 21k. In the cases I’m testing (DoQ and DoH), even if I send a large number of DNS queries over a QUIC connection, TLS data is only written into the buffer during the initial handshake, and it’s only a few hundred bytes. That means that most of the 21k allocated ends up unused for the lifetime of the connection.

To reduce this, I made a local change to comment out the preallocation. It seemed to work, although I suspect that it potentially isn’t safe if the buffer is reallocated. It looks like the code previously attached the TLS data directly onto the QUIC_DATA structures (changed in https://github.com/quictls/openssl/commit/075ede1dd8229425e494194188f0629857aeb52d), and that was removed because it was involved incomplete QUIC_DATA objects.

The older code, which would use far less memory, didn’t look all that complicated, but could potentially be simplified by storing the partial buffer on the connection, instead of on an incomplete QUIC_DATA.

I don’t think there’s anything specific to my use case that’s causing this, but am also not sure if the memory usage is an issue for anyone else. If it’s not, I’ll stick with a local workaround, but it might be good if it were improved for everyone.