metacall / core

MetaCall: The ultimate polyglot programming experience.
https://metacall.io
Apache License 2.0
1.56k stars 160 forks source link

OpenSSL incompatibility #31

Open viferga opened 5 years ago

viferga commented 5 years ago

When running a NodeJS script with NodeJS (compiled with OpenSSL statically) and then loading any library in Python that uses Py__Hash (which is a C extension for Python using OpenSSL dynamically linked) it generates a segmentation fault (this problem can be reproduced by running this example with NodeJS: https://github.com/metacall/pdf-generator-email-sender-landing-page-example).

In this example Py__Hash is using OpenSSL 1.1. and NodeJS is using 1.0.2p:

{ http_parser: '2.8.0',
  node: '8.12.0',
  v8: '6.2.414.66',
  uv: '1.19.2',
  zlib: '1.2.11',
  ares: '1.10.1-DEV',
  modules: '57',
  nghttp2: '1.32.0',
  napi: '3',
  openssl: '1.0.2p',
  icu: '60.1',
  unicode: '10.0',
  cldr: '32.0',
  tz: '2017c' }

This is a fragment of the segmentation fault obtained with valgrind referring to OpenSSL initialization of Py_Hash.

==3013== Invalid read of size 1 ==3013== at 0x5D8D520: strcmp_sse2_unaligned (strcmp-sse2-unaligned.S:24) ==3013== by 0x123A409: lh_insert (in /usr/bin/node) ==3013== by 0x12458F0: OBJ_NAME_add (in /usr/bin/node) ==3013== by 0x1230FF5: EVP_add_digest (in /usr/bin/node) ==3013== by 0xC1034FA: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==3013== by 0xC11AA78: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==3013== by 0x5AEF758: pthread_once_slow (pthread_once.c:116) ==3013== by 0xC171558: CRYPTO_THREAD_run_once (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==3013== by 0xC11AFB2: OPENSSL_init_crypto (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==3013== by 0xBB40B10: PyInithashlib (in /usr/lib/python3.5/lib-dynload/_hashlib.cpython-35m-x86_64-linux-gnu.so) ==3013== by 0xA30F70F: _PyImport_LoadDynamicModuleWithSpec (in /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0) ==3013== by 0xA312FE6: ??? (in /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0) ==3013== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==3013== ==3013== ==3013== Process terminating with default action of signal 11 (SIGSEGV) ==3013== Access not within mapped region at address 0x0 ==3013== at 0x5D8D520: __strcmp_sse2_unaligned (strcmp-sse2-unaligned.S:24) ==3013== by 0x123A409: lh_insert (in /usr/bin/node) ==3013== by 0x12458F0: OBJ_NAME_add (in /usr/bin/node) ==3013== by 0x1230FF5: EVP_add_digest (in /usr/bin/node) ==3013== by 0xC1034FA: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==3013== by 0xC11AA78: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==3013== by 0x5AEF758: pthread_once_slow (pthread_once.c:116) ==3013== by 0xC171558: CRYPTO_THREAD_run_once (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==3013== by 0xC11AFB2: OPENSSL_init_crypto (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==3013== by 0xBB40B10: PyInit__hashlib (in /usr/lib/python3.5/lib-dynload/_hashlib.cpython-35m-x86_64-linux-gnu.so) ==3013== by 0xA30F70F: _PyImport_LoadDynamicModuleWithSpec (in /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0) ==3013== by 0xA312FE6: ??? (in /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0) ==3013== If you believe this happened as a result of a stack ==3013== overflow in your program's main thread (unlikely but ==3013== possible), you can try to increase the size of the ==3013== main thread stack using the --main-stacksize= flag. ==3013== The main thread stack size used in this run was 8388608.

viferga commented 4 years ago

Seems to be solved with current distributable implementation but needs revision.

viferga commented 1 year ago

This is still happening on ubuntu:jammy, with versions

root@b764c64be8c7:/usr/local/metacall/build# node -e 'console.log(process.versions.openssl)' 
1.1.1m
root@b764c64be8c7:/usr/local/metacall/build# python3 -c 'import ssl; print(ssl.OPENSSL_VERSION.split()[1])'
3.0.2

I am reopening this for now, I have tried to use dlmopen for try to isolate the libraries but it seems not to work, it fails with valgrind, probably some issue related to the current design of metacall, which relies on having a global scope with metacall library.

viferga commented 1 year ago

Also it seems to be related with rust, when using a similar version (3.0.4 and 3.0.6) if we disable calls in node to rs when python is also loaded in node_port, it does not fail.

https://github.com/metacall/core/commit/9be934887d70bff3c6de9657fb9f9d262e9aa40c

viferga commented 1 year ago

Here's the issue of Rust: https://github.com/metacall/core/commit/ad9e7cf7c9a5d04e214f2c0bc1fdaabfa8d1a911 It is unrelated apparently.