nabla-c0d3 / nassl

Experimental OpenSSL wrapper for Python 3.8+ and SSLyze.
GNU Affero General Public License v3.0
39 stars 35 forks source link

Segmentation Fault on RHEL 8.5 #95

Closed nabla-c0d3 closed 7 months ago

nabla-c0d3 commented 2 years ago

https://github.com/nabla-c0d3/sslyze/issues/556

nabla-c0d3 commented 2 years ago

I made a script to reproduce the seg fault: https://github.com/nabla-c0d3/nassl/blob/%2395-debug-seg-fault/repro.py It seems to happen within this function: https://github.com/nabla-c0d3/nassl/blob/release/nassl/_nassl/nassl_SSL.c#L840

The docker commands are available in the sslyze issue; this command needs to be run to build nassl locally:

yum install python38-devel
nabla-c0d3 commented 1 year ago

Similar issue at https://github.com/nabla-c0d3/sslyze/issues/621 but the crash happens on import?

nabla-c0d3 commented 1 year ago

The following file can reproduce the "scsv" issue:

import socket

from nassl._nassl import WantReadError
from nassl.legacy_ssl_client import LegacySslClient
from nassl.ssl_client import OpenSslVerifyEnum, OpenSslVersionEnum

def do_handshakelol(self):
    while True:
        try:
            self._ssl.do_handshake()
            self._is_handshake_completed = True
            # Handshake was successful
            return

        except WantReadError:
            # OpenSSL is expecting more data from the peer
            # Send available handshake data to the peer
            self._flush_ssl_engine()

            # Recover the peer's encrypted response
            handshake_data_in = self._sock.recv(self._DEFAULT_BUFFER_SIZE)
            if len(handshake_data_in) == 0:
                raise IOError("Nassl SSL handshake failed: peer did not send data back.")
            # Pass the data to the SSL engine
            self._network_bio.write(handshake_data_in)

# Start the server first
#  /sslyze/tests/openssl_server/openssl-1-0-0e-linux64 s_server -cert /sslyze/tests/openssl_server/server-rsa-cert.pem -key /sslyze/tests/openssl_server/server-rsa-key.pem -accept 12345 -cipher "DEFAULT" &
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(5)
hostname = "localhost"
sock.connect((hostname, 12345))

ssl_client = LegacySslClient(
    ssl_version=OpenSslVersionEnum.SSLV3,  # Only happens with this value
    underlying_socket=sock,
    ssl_verify=OpenSslVerifyEnum.NONE,
)
do_handshakelol(ssl_client)

It crashes with the following stack trace:

#0  0x00007fad79724f04 in RSA_verify () from /lib64/libcrypto.so.1.1
#1  0x00007fad7680b85d in ssl3_get_key_exchange ()
   from /sslyze/lol/nassl/nassl/_nassl_legacy.cpython-38-x86_64-linux-gnu.so
#2  0x00007fad7680ce63 in ssl3_connect () from /sslyze/lol/nassl/nassl/_nassl_legacy.cpython-38-x86_64-linux-gnu.so
#3  0x00007fad767fb098 in nassl_SSL_do_handshake (self=0x7fad77e12690, args=<optimized out>)
    at nassl/_nassl/nassl_SSL.c:151
#4  0x00007fad7913a487 in method_vectorcall_NOARGS () from /lib64/libpython3.8.so.1.0
#5  0x00007fad791a83ca in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#6  0x00007fad79165e5f in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#7  0x00007fad791a809a in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#8  0x00007fad79164b67 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#9  0x00007fad79165c53 in PyEval_EvalCode () from /lib64/libpython3.8.so.1.0
#10 0x00007fad791f24da in run_eval_code_obj () from /lib64/libpython3.8.so.1.0
#11 0x00007fad79218572 in run_mod () from /lib64/libpython3.8.so.1.0
#12 0x00007fad790ca55c in pyrun_file () from /lib64/libpython3.8.so.1.0
#13 0x00007fad790d1420 in PyRun_SimpleFileExFlags () from /lib64/libpython3.8.so.1.0
#14 0x00007fad7921a13f in Py_RunMain () from /lib64/libpython3.8.so.1.0
#15 0x00007fad7921a2c9 in Py_BytesMain () from /lib64/libpython3.8.so.1.0
#16 0x00007fad78069d85 in __libc_start_main () from /lib64/libc.so.6
#17 0x000055d594c0078e in _start ()

What's surprising is that there is a call to RSA_verify () from /lib64/libcrypto.so.1.1. However, the system's libcrypto should never be used (as nassl statically links specific versions of OpenSSL.

My current guess is as follow:

Which one gets used at runtime? This could be the problem.

nabla-c0d3 commented 1 year ago

Similar stack trace for the crash when running gdb -args python -m pytest tests/ -k test_ssl_2:

#0  0x00007f51b3f75e51 in RSA_public_encrypt () from /lib64/libcrypto.so.1.1
#1  0x00007f51abc5cae1 in ssl2_connect () from /sslyze/lol/nassl/nassl/_nassl_legacy.cpython-38-x86_64-linux-gnu.so
#2  0x00007f51abc58098 in nassl_SSL_do_handshake (self=0x7f51ab533cc0, args=<optimized out>)
    at nassl/_nassl/nassl_SSL.c:151
#3  0x00007f51b3994487 in method_vectorcall_NOARGS () from /lib64/libpython3.8.so.1.0
#4  0x00007f51b3a023ca in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#5  0x00007f51b39c067a in method_vectorcall () from /lib64/libpython3.8.so.1.0
#6  0x00007f51b3a06f69 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#7  0x00007f51b39bfe5f in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#8  0x00007f51b39c0954 in method_vectorcall () from /lib64/libpython3.8.so.1.0
#9  0x00007f51b39b88bf in PyObject_Call () from /lib64/libpython3.8.so.1.0
[...]
nabla-c0d3 commented 1 year ago

This stack trace for gdb -args python -m pytest tests/ -k test_get_dh_info_dh is different; looks like a coding error:


#0  nassl_SSL_get_dh_info (self=<optimized out>) at nassl/_nassl/nassl_SSL.c:863
#1  0x00007f422a8c5487 in method_vectorcall_NOARGS () from /lib64/libpython3.8.so.1.0
#2  0x00007f422a9333ca in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#3  0x00007f422a8f0e5f in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
[...]
nabla-c0d3 commented 1 year ago

What's surprising is that there is a call to RSA_verify () from /lib64/libcrypto.so.1.1. However, the system's libcrypto should never be used (as nassl statically links specific versions of OpenSSL.

My current guess is as follow:

* Python loads /lib64/libcrypto.so.1.1 which contains `RSA_verify()`.

* nassl loads _nassl_legacy.so, which contains (another) `RSA_verify()`.

Which one gets used at runtime? This could be the problem.

Turns out this was the problem. OpenSSL code embedded within nassl would end up calling symbols (such as RSA_verify()) in the system's libcrypto.so, instead of calling the same symbol available in the libcrypto code embedded (ie. statically linked) within nassl.

This would cause incompatible versions of OpenSSL to call each other, resulting in a segmentation fault on some Linux distros (such as Red Hat). It's unclear to me why it would only happen on specific distros.

The fix is to add the following extra_link_args when building nassl: -Wl,--exclude-libs=ALL. This causes symbols that are internal to nassl (ie. statically linked OpenSSL code such as RSA_verify()) to be removed from nassl.so's symbol table. This seems to then force the compiler/linker to point code to the right internal symbols, so that there is no "symbol confusion" at runtime when both the nassl.so and the system's librypto.so expose the same OpenSSL symbols (such as RSA_verify()).

Before the fix:

nm -gD nassl/_nassl_legacy.cpython-38-x86_64-linux-gnu.so | grep -i RSA_verify
00000000000de770 T RSA_verify
0000000000184340 T RSA_verify_ASN1_OCTET_STRING
0000000000184ac0 T RSA_verify_PKCS1_PSS
0000000000184490 T RSA_verify_PKCS1_PSS_mgf1
00000000000de270 T int_rsa_verify

After the fix:

nm -gD nassl/_nassl_legacy.cpython-38-x86_64-linux-gnu.so | grep -i RSA_verify
<nothing>