libressl / portable

LibreSSL Portable itself. This includes the build scaffold and compatibility layer that builds portable LibreSSL from the OpenBSD source code. Pull requests or patches sent to tech@openbsd.org are welcome.
https://www.libressl.org
1.37k stars 267 forks source link

QUIC regression on haproxy #862

Closed chipitsine closed 1 year ago

chipitsine commented 1 year ago

LibreSSL - master branch haproxy - https://github.com/chipitsine/haproxy caught by running QUIC Interop suite

AddressSanitizer:DEADLYSIGNAL
=================================================================
==28==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x7f11f4a979e8 bp 0x7fffad729ce0 sp 0x7fffad729ca0 T0)
==28==The signal is caused by a WRITE memory access.
==28==Hint: address points to the zero page.
    #0 0x7f11f4a979e7 in EVP_CipherInit_ex evp/evp_enc.c:95
    #1 0x7f11f4a9884d in EVP_DecryptInit_ex evp/evp_enc.c:292
    #2 0x5619e981753b in quic_tls_decrypt src/quic_tls.c:564
    #3 0x5619e978218c in qc_pkt_decrypt src/quic_conn.c:1625
    #4 0x5619e979dc04 in qc_treat_rx_pkts src/quic_conn.c:4573
    #5 0x5619e97a2ef2 in quic_conn_app_io_cb src/quic_conn.c:5029
    #6 0x5619e9e0048b in run_tasks_from_lists src/task.c:596
    #7 0x5619e9e0227c in process_runnable_tasks src/task.c:876
    #8 0x5619e9cf5ed6 in run_poll_loop src/haproxy.c:2968
    #9 0x5619e9cf6d70 in run_thread_poll_loop src/haproxy.c:3167
    #10 0x5619e9cf9bcd in main src/haproxy.c:3827
    #11 0x7f11f46320b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
    #12 0x5619e96cadbd in _start (/usr/local/sbin/haproxy+0x1b0dbd)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV evp/evp_enc.c:95 in EVP_CipherInit_ex
==28==ABORTING

repro steps:

modprobe ip6table_filter
git clone https://github.com/chipitsine/gh860
cd gh860

docker build --no-cache -t haproxy-qns-libressl:local haproxy-qnc-libressl
cd quic-interop-runner
pip install -r requirements.txt
python run.py
chipitsine commented 1 year ago

sorry, it seem to be docker caching ((

botovq commented 1 year ago

Heh. I prefer false alarms over unreported issues :)

In your repo I saw that you enabled HAVE_SSL_KEYLOG. I don't think that's worth doing. We only have a dummy keylog API to avoid patching some OpenBSD ports. This doesn't do anything and we don't really plan on supporting this.

chipitsine commented 1 year ago

I've double checked, cache is not involved. It's not a false alarm.

in logs I see haproxy version HAProxy version cf950d9b69e307279d61999c05c29fb89073245d 2023/05/17 - https://haproxy.org/

it is https://github.com/chipitsine/haproxy/commit/cf950d9b69e307279d61999c05c29fb89073245d

that means, we already have chacha20_poly1305: https://github.com/chipitsine/haproxy/commit/97c344dae0f75220ffe48949977192f43b9c3d75

meanwhile, there's still

AddressSanitizer:DEADLYSIGNAL
=================================================================
==31==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x7f25ad8979e8 bp 0x7fffed029cc0 sp 0x7fffed029c80 T0)
==31==The signal is caused by a WRITE memory access.
==31==Hint: address points to the zero page.
    #0 0x7f25ad8979e7 in EVP_CipherInit_ex evp/evp_enc.c:95
    #1 0x7f25ad89884d in EVP_DecryptInit_ex evp/evp_enc.c:292 
    #2 0x556e1983053b in quic_tls_decrypt src/quic_tls.c:564
    #3 0x556e1979b18c in qc_pkt_decrypt src/quic_conn.c:1625
    #4 0x556e197b6c04 in qc_treat_rx_pkts src/quic_conn.c:4573
    #5 0x556e197bbef2 in quic_conn_app_io_cb src/quic_conn.c:5029
    #6 0x556e19e1948b in run_tasks_from_lists src/task.c:596
    #7 0x556e19e1b27c in process_runnable_tasks src/task.c:876
    #8 0x556e19d0eed6 in run_poll_loop src/haproxy.c:2968
    #9 0x556e19d0fd70 in run_thread_poll_loop src/haproxy.c:3167
    #10 0x556e19d12bcd in main src/haproxy.c:3827
    #11 0x7f25ad4320b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
    #12 0x556e196e3dbd in _start (/usr/local/sbin/haproxy+0x1b0dbd)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV evp/evp_enc.c:95 in EVP_CipherInit_ex
==31==ABORTING
chipitsine commented 1 year ago

(I'll drop latest commit, but currently it helps me to run QUIC Interop, because it's configuration requires key logging)

chipitsine commented 1 year ago
server          | HAProxy version cf950d9b69e307279d61999c05c29fb89073245d 2023/05/17 - https://haproxy.org/
server          | Status: development branch - not safe for use in production.
server          | Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
server          | Running on: Linux 6.2.15-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 11 17:37:39 UTC 2023 x86_64
server          | Build options :
server          |   TARGET  = linux-glibc
server          |   CPU     = generic
server          |   CC      = gcc
server          |   CFLAGS  = -pg -O0 -g -Wno-deprecated-declarations -fsanitize=address -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment -Werror
server          |   OPTIONS = USE_OPENSSL=1 USE_LUA=1 USE_OBSOLETE_LINKER=1 USE_QUIC=1
server          |   DEBUG   = -DDEBUG_DONT_SHARE_POOLS -DDEBUG_MEMORY_POOLS -DDEBUG_STRICT=2 -DDEBUG_TASK -DDEBUG_FAIL_ALLOC
server          | 
server          | Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS +OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT -PCRE -PCRE2 -PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION +QUIC +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB
server          | 
server          | Default settings :
server          |   bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
server          | 
server          | Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=2).
server          | Built with gcc compiler version 9.4.0 with address sanitizer
server          | Encrypted password support via crypt(3): yes
server          | Built without PCRE or PCRE2 support (using libc's regex instead)
server          | Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
server          | Built with libslz for stateless compression.
server          | Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
server          | Running with a replaced memory allocator (e.g. via LD_PRELOAD).
server          | Built with network namespace support.
server          | Built with Lua version : Lua 5.3.3
server          | Built with OpenSSL version : LibreSSL 3.8.0
server          | Running on OpenSSL version : LibreSSL 3.8.0
server          | OpenSSL library supports TLS extensions : yes
server          | OpenSSL library supports SNI : yes
server          | OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
botovq commented 1 year ago

Thanks. I can only state the obvious so far: for some reason haproxy calls EVP_DecryptInit_ex() with an EVP_CIPHER_CTX of NULL since rx_ctx is NULL in qc_pkt_decrypt(). It means an assumption is violated somewhere.

Since you told us this is LibreSSL specific this can be misbehavior or a bug in LibreSSL that messes up haproxy's internal state tracking, some race because we don't have some locking that haproxy expects us to have, or a bug in haproxy, or something else entirely. I can't currently tell.

We will need to have a good reproducer to dig deeper into this. Unfortunately, after doing the reproduction steps, python run.py stops immediately with:

Saving logs to logs_2023-05-25T00:22:18.
haproxy client not compliant.
Not compliant, skipping
Run took 0:00:00.415228
+---------+---------+
|         | haproxy |
+---------+---------+
| quic-go |         |
|         |         |
|         |         |
+---------+---------+
+---------+---------+
|         | haproxy |
+---------+---------+
| quic-go |         |
+---------+---------+

Does that ring a bell?

chipitsine commented 1 year ago

quic interop is niche tool. it set up several docker containers using docker-compose. I was able to run it on Fedora.

when I try it on Ubuntu 20.04 and 22.04 (I ran it as python run.py --debug)

Creating network "quic-interop-runner_leftnet" with driver "bridge"
Creating network "quic-interop-runner_rightnet" with driver "bridge"
Removing sim
Recreating 476b3b4333b1_sim ... 
Recreating 476b3b4333b1_sim ... done
Recreating client           ... 
Recreating client           ... done
Attaching to sim, client
client          | Setting up routes...
client          | Actual changes:
client          | tx-checksumming: off
client          | tx-checksum-ip-generic: off
client          | tx-checksum-sctp: off
client          | tcp-segmentation-offload: off
client          | tx-tcp-segmentation: off [requested on]
client          | tx-tcp-ecn-segmentation: off [requested on]
client          | tx-tcp-mangleid-segmentation: off [requested on]
client          | tx-tcp6-segmentation: off [requested on]
client          | Endpoint's IPv4 address is 193.167.0.100
client          | Endpoint's IPv6 address is fd00:cafe:cafe::100
sim             | ip6tables v1.8.4 (legacy): can't initialize ip6tables table `filter': Table does not exist (do you need to insmod?)
sim             | Perhaps ip6tables or your kernel needs to be upgraded.
sim exited with code 3
3
Aborting on container exit...

Not compliant, skipping

I couldn't figure out how to deal with that ((

chipitsine commented 1 year ago

I'll try to get it running on Ubuntu

(this looks like a candidate https://stackoverflow.com/questions/58726984/docker-errno-2-ip6tables-v1-6-1-cant-initialize-ip6tables-table-filter-ta )

chipitsine commented 1 year ago

@botovq ,

I confirm that modprobe ip6table_filter did a trick on Ubuntu.

chipitsine commented 1 year ago

QUIC Interop perform about a dozen of tests, however only couple of them fail

logs_2023-05-26T20:10:28/haproxy_aioquic/handshakecorruption/server/log.txt:SUMMARY: AddressSanitizer: SEGV evp/evp_enc.c:95 in EVP_CipherInit_ex
logs_2023-05-26T20:10:28/haproxy_aioquic/transfercorruption/server/log.txt:SUMMARY: AddressSanitizer: SEGV evp/evp_enc.c:95 in EVP_CipherInit_ex

those are "corruption" test with high rate of renegotiations, looks interesting

class TestCaseHandshakeCorruption(TestCaseHandshakeLoss):
    @staticmethod
    def name():
        return "handshakecorruption"

    @staticmethod
    def abbreviation():
        return "C1"

    @staticmethod
    def desc():
        return "Handshake completes under extreme packet corruption."

    @staticmethod
    def scenario() -> str:
        """ Scenario for the ns3 simulator """
        return "corrupt-rate --delay=15ms --bandwidth=10Mbps --queue=25 --rate_to_server=30 --rate_to_client=30"
chipitsine commented 1 year ago

I've run tests again, seems there's another regression.

AddressSanitizer:DEADLYSIGNAL
=================================================================
==28==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f0b906c9d78 bp 0x7f0b8d1f24e0 sp 0x7f0b8d1f1c48 T2)
==28==The signal is caused by a READ memory access.
==28==Hint: address points to the zero page.
    #0 0x7f0b906c9d77  (/lib/x86_64-linux-gnu/libc.so.6+0xbbd77)
    #1 0x7f0b9109b37e in __interceptor_memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:790
    #2 0x7f0b90ace889 in pkey_hkdf_ctrl kdf/hkdf_evp.c:138
    #3 0x7f0b90aadb90 in EVP_PKEY_CTX_ctrl evp/pmeth_lib.c:365
    #4 0x55e525543b43 in quic_hkdf_expand src/quic_tls.c:129
    #5 0x55e525544262 in quic_hkdf_expand_label src/quic_tls.c:224
    #6 0x55e525544981 in quic_tls_sec_update src/quic_tls.c:320
    #7 0x55e5254a7959 in quic_tls_key_update src/quic_conn.c:942
    #8 0x55e5254b81ae in qc_provide_cdata src/quic_conn.c:2509
    #9 0x55e5254bd974 in qc_handle_crypto_frm src/quic_conn.c:2976
    #10 0x55e5254bfa33 in qc_parse_pkt_frms src/quic_conn.c:3203
    #11 0x55e5254ca8d3 in qc_treat_rx_pkts src/quic_conn.c:4597
    #12 0x55e5254d0a4b in quic_conn_io_cb src/quic_conn.c:5135
    #13 0x55e525b3363e in run_tasks_from_lists src/task.c:596 
    #14 0x55e525b3542f in process_runnable_tasks src/task.c:876
    #15 0x55e525a28e66 in run_poll_loop src/haproxy.c:2968
    #16 0x55e525a29d00 in run_thread_poll_loop src/haproxy.c:3167
    #17 0x7f0b90d6e608 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x8608)
    #18 0x7f0b9072d162 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f162)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0xbbd77) 
Thread T2 created by T0 here:
    #0 0x7f0b9103a815 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cc:208
    #1 0x55e525c73e0e in setup_extra_threads src/thread.c:252 
    #2 0x55e525a2cd26 in main src/haproxy.c:3832
    #3 0x7f0b906320b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
chipitsine commented 1 year ago

@botovq , I've compared OpenSSL source, it allows null key

https://github.com/quictls/openssl/blob/OpenSSL_1_1_1s%2Bquic/crypto/kdf/hkdf.c#L113

should LibreSSL contain "p2" validation against NULL as well ?

currently LibreSSL is using memcpy instead of OPENSSL_memdup

botovq commented 1 year ago

Thanks, good catch. We do not have OPENSSL_memdup(). It was missed in the conversion that it was NULL-safe (and the other cases in the switch have explicit checks...).

The below should align us with OpenSSL behavior, except that we don't leave the kctx->key_len inconsistent with kctx->key. I will see about landing something like this.

Index: kdf/hkdf_evp.c
===================================================================
RCS file: /cvs/src/lib/libcrypto/kdf/hkdf_evp.c,v
retrieving revision 1.19
diff -u -p -r1.19 hkdf_evp.c
--- kdf/hkdf_evp.c  26 Nov 2022 16:08:53 -0000  1.19
+++ kdf/hkdf_evp.c  25 Jun 2023 21:13:17 -0000
@@ -129,10 +129,17 @@ pkey_hkdf_ctrl(EVP_PKEY_CTX *ctx, int ty
        return 1;

    case EVP_PKEY_CTRL_HKDF_KEY:
-       if (p1 <= 0)
+       if (p1 < 0)
            return 0;

        freezero(kctx->key, kctx->key_len);
+       kctx->key = NULL;
+       kctx->key_len = 0;
+
+       /* Match OpenSSL's behavior. */
+       if (p1 == 0 || p2 == NULL)
+           return 0;
+
        if ((kctx->key = malloc(p1)) == NULL)
            return 0;
        memcpy(kctx->key, p2, p1);
botovq commented 1 year ago

Committed and pushed to github: https://github.com/libressl/openbsd/commit/ac48eb74f5d0f870c63d096e150a1c05a42b8bf4

chipitsine commented 1 year ago

great, I ran tests, now I see

==28==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x7fddcff310f1 bp 0x7fff3c12bd60 sp 0x7fff3c12bd20 T0)
==28==The signal is caused by a WRITE memory access.
==28==Hint: address points to the zero page.
    #0 0x7fddcff310f0 in EVP_CipherInit_ex evp/evp_enc.c:95
    #1 0x7fddcff31f56 in EVP_DecryptInit_ex evp/evp_enc.c:292
    #2 0x55c30d333645 in quic_tls_decrypt src/quic_tls.c:564
    #3 0x55c30d29cee0 in qc_pkt_decrypt src/quic_conn.c:1650
    #4 0x55c30d2b87cf in qc_treat_rx_pkts src/quic_conn.c:4591
    #5 0x55c30d2bdaa7 in quic_conn_app_io_cb src/quic_conn.c:5047
    #6 0x55c30d92163e in run_tasks_from_lists src/task.c:596
    #7 0x55c30d92342f in process_runnable_tasks src/task.c:876
    #8 0x55c30d816e66 in run_poll_loop src/haproxy.c:2968
    #9 0x55c30d817d00 in run_thread_poll_loop src/haproxy.c:3167
    #10 0x55c30d81ad37 in main src/haproxy.c:3838
    #11 0x7fddcf90f0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
    #12 0x55c30d1e524d in _start (/usr/local/sbin/haproxy+0x1b224d)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV evp/evp_enc.c:95 in EVP_CipherInit_ex
==28==ABORTING
botovq commented 1 year ago

This looks like our old friend again. I asked @4a6f656c to take a look.

chipitsine commented 1 year ago

I do not make friendship with bugs, however I agree we seen it already

chipitsine commented 1 year ago

I re-ran test suite, only this caught. seems original issue is gone

chipitsine commented 1 year ago

not reproduced anymore.