pyca / pyopenssl

A Python wrapper around the OpenSSL library
https://pyopenssl.org/
Apache License 2.0
889 stars 419 forks source link

Deadlock with pyOpenSSL 16.1.0, Deluge and SSL torrent trackers #551

Closed JohnDoee closed 8 years ago

JohnDoee commented 8 years ago

When using Deluge with pyOpenSSL 16.1.0 and torrents with trackers using https then Deluge will end up in a deadlock. It works fine with 16.0.0.

I've attached the stacktrace and I think the deadlock is between Thread 3 trying to acquire GIL lock and Thread 1 currently holding it. I'm unsure why this is happening as libtorrent-rasterbar is a C library.

You can reproduce this by installing pyOpenSSL 16.1.0, any recent version of Deluge, any recent version of libtorrent and any recent version of OpenSSL. Add a few torrents with https trackers (these trackers does not need to work, loading the SSL context seems sufficient). Restart Deluge until it locks up during boot.

After it locks up once it'll keep locking up.

Thread 4 (Thread 0x7ffff1679700 (LWP 15931)):
#0  0x00007ffff7bc8404 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007ffff36e58e4 in boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#2  0x00007ffff36e9a31 in boost::asio::detail::task_io_service::run(boost::system::error_code&) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#3  0x00007ffff36e9be6 in boost::asio::io_service::run() () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#4  0x00007ffff36de27f in boost_asio_detail_posix_thread_function () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#5  0x00007ffff7bc4184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#6  0x00007ffff78f137d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
Thread 3 (Thread 0x7ffff1eba700 (LWP 15915)):
#0  0x00007ffff7bca670 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x0000000000547fa4 in PyThread_acquire_lock ()
No symbol table info available.
#2  0x000000000054b8e6 in PyEval_RestoreThread ()
No symbol table info available.
#3  0x000000000059ea74 in _PyOS_URandom ()
No symbol table info available.
#4  0x000000000057f5e0 in ?? ()
No symbol table info available.
#5  0x000000000049968d in PyEval_EvalFrameEx ()
No symbol table info available.
#6  0x00000000004a1c9a in ?? ()
No symbol table info available.
#7  0x0000000000505f96 in PyObject_Call ()
No symbol table info available.
#8  0x00007ffff48b3fcc in general_invoke_callback (decode_args_from_libffi=decode_args_from_libffi@entry=0, result=result@entry=0x7ffff1eb7f90, args=args@entry=0x7ffff1eb7f90 "\020\233\002\344\377\177", userdata=<optimized out>) at c/_cffi_backend.c:5119
        cb_args = <optimized out>
        ct = <optimized out>
        signature = 0x7ffff4dc75d0
        py_ob = 0x7ffff4e22de8
        py_args = 0x7ffff26d5680
        py_res = 0x0
        py_rawerr = <optimized out>
        onerror_cb = <optimized out>
        i = <optimized out>
        n = 2
        extra_error_line = 0x0
#9  0x00007ffff48b4198 in cffi_call_python (externpy=0x7ffff4da40a0 <_cffi_externpy__Cryptography_rand_bytes>, args=0x7ffff1eb7f90 "\020\233\002\344\377\177") at c/call_python.c:242
        state = PyGILState_UNLOCKED
        err = 0
#10 0x00007ffff4b0243c in Cryptography_rand_bytes (a0=<optimized out>, a1=<optimized out>) at build/temp.linux-x86_64-2.7/_openssl.c:11783
        a = "\020\233\002\344\377\177\000\000\020\000\000\000\377\177\000"
        p = 0x7ffff1eb7f90 "\020\233\002\344\377\177"
#11 0x00007ffff5f1ba63 in SSL_CTX_new () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
No symbol table info available.
#12 0x00007ffff38da0a8 in boost::asio::ssl::context::context(boost::asio::ssl::context_base::method) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#13 0x00007ffff38da28a in boost::asio::ssl::context::context(boost::asio::io_service&, boost::asio::ssl::context_base::method) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#14 0x00007ffff372867c in libtorrent::http_connection::start(std::string const&, std::string const&, libtorrent::time_duration, int, libtorrent::proxy_settings const*, bool, int, boost::asio::ip::address const&, libtorrent::i2p_connection*) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#15 0x00007ffff3728d3b in libtorrent::http_connection::get(std::string const&, libtorrent::time_duration, int, libtorrent::proxy_settings const*, int, std::string const&, boost::asio::ip::address const&, libtorrent::i2p_connection*) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#16 0x00007ffff375aa4a in libtorrent::http_tracker_connection::start() () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#17 0x00007ffff389241b in libtorrent::tracker_manager::queue_request(boost::asio::io_service&, libtorrent::connection_queue&, libtorrent::tracker_request, std::string const&, boost::weak_ptr<libtorrent::request_callback>) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#18 0x00007ffff384a4ba in libtorrent::torrent::announce_with_tracker(libtorrent::tracker_request::event_t, boost::asio::ip::address const&) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#19 0x00007ffff384b08a in libtorrent::torrent::replace_trackers(std::vector<libtorrent::announce_entry, std::allocator<libtorrent::announce_entry> > const&) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#20 0x00007ffff3884c4b in ?? () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#21 0x00007ffff36e597b in boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#22 0x00007ffff36e9a31 in boost::asio::detail::task_io_service::run(boost::system::error_code&) () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#23 0x00007ffff38008bc in libtorrent::aux::session_impl::main_thread() () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#24 0x00007ffff36de27f in boost_asio_detail_posix_thread_function () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#25 0x00007ffff7bc4184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#26 0x00007ffff78f137d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
Thread 2 (Thread 0x7ffff26bb700 (LWP 15914)):
#0  0x00007ffff7bc8404 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007ffff370f573 in libtorrent::disk_io_thread::thread_fun() () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#2  0x00007ffff36de27f in boost_asio_detail_posix_thread_function () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#3  0x00007ffff7bc4184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#4  0x00007ffff78f137d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
Thread 1 (Thread 0x7ffff7fdd740 (LWP 15867)):
#0  0x00007ffff7bc8404 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007ffff387339b in libtorrent::torrent_handle::file_priorities() const () from /usr/lib/libtorrent-rasterbar.so.7
No symbol table info available.
#2  0x00007ffff3bb7a52 in ?? () from /home/hd22/undeleted/broken/local/lib/python2.7/site-packages/libtorrent.so
No symbol table info available.
#3  0x00007ffff3bc5c28 in ?? () from /home/hd22/undeleted/broken/local/lib/python2.7/site-packages/libtorrent.so
No symbol table info available.
#4  0x00007ffff31fd64a in boost::python::objects::function::call(_object*, _object*) const () from /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.54.0
No symbol table info available.
#5  0x00007ffff31fd9b8 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.54.0
No symbol table info available.
#6  0x00007ffff3207c93 in boost::python::handle_exception_impl(boost::function0<void>) () from /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.54.0
No symbol table info available.
#7  0x00007ffff31fc2c3 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.54.0
No symbol table info available.
#8  0x0000000000499be5 in PyEval_EvalFrameEx ()
No symbol table info available.
#9  0x0000000000499ef2 in PyEval_EvalFrameEx ()
No symbol table info available.
#10 0x0000000000499ef2 in PyEval_EvalFrameEx ()
No symbol table info available.
#11 0x00000000004a1c9a in ?? ()
No symbol table info available.
#12 0x00000000004dfe94 in ?? ()
No symbol table info available.
#13 0x0000000000505f96 in PyObject_Call ()
No symbol table info available.
#14 0x00000000004de41a in ?? ()
No symbol table info available.
#15 0x00000000005039eb in ?? ()
No symbol table info available.
#16 0x0000000000499be5 in PyEval_EvalFrameEx ()
No symbol table info available.
#17 0x00000000004a090c in PyEval_EvalCodeEx ()
No symbol table info available.
#18 0x0000000000499a52 in PyEval_EvalFrameEx ()
No symbol table info available.
#19 0x0000000000499ef2 in PyEval_EvalFrameEx ()
No symbol table info available.
#20 0x00000000004a1c9a in ?? ()
No symbol table info available.
#21 0x0000000000505f96 in PyObject_Call ()
No symbol table info available.
#22 0x000000000049b07a in PyEval_EvalFrameEx ()
No symbol table info available.
#23 0x00000000004a090c in PyEval_EvalCodeEx ()
No symbol table info available.
#24 0x000000000049ab45 in PyEval_EvalFrameEx ()
No symbol table info available.
#25 0x00000000004a090c in PyEval_EvalCodeEx ()
No symbol table info available.
#26 0x000000000049ab45 in PyEval_EvalFrameEx ()
No symbol table info available.
#27 0x00000000004a1c9a in ?? ()
No symbol table info available.
#28 0x0000000000505f96 in PyObject_Call ()
No symbol table info available.
#29 0x000000000049b07a in PyEval_EvalFrameEx ()
No symbol table info available.
#30 0x00000000004a090c in PyEval_EvalCodeEx ()
No symbol table info available.
#31 0x000000000049ab45 in PyEval_EvalFrameEx ()
No symbol table info available.
#32 0x00000000004a090c in PyEval_EvalCodeEx ()
No symbol table info available.
#33 0x0000000000499a52 in PyEval_EvalFrameEx ()
No symbol table info available.
#34 0x00000000004a090c in PyEval_EvalCodeEx ()
No symbol table info available.
#35 0x000000000049ab45 in PyEval_EvalFrameEx ()
No symbol table info available.
#36 0x00000000004a090c in PyEval_EvalCodeEx ()
No symbol table info available.
#37 0x0000000000499a52 in PyEval_EvalFrameEx ()
No symbol table info available.
#38 0x00000000004a1c9a in ?? ()
No symbol table info available.
#39 0x00000000004dfe94 in ?? ()
No symbol table info available.
#40 0x0000000000505f96 in PyObject_Call ()
No symbol table info available.
#41 0x00000000004de41a in ?? ()
No symbol table info available.
#42 0x00000000005039eb in ?? ()
No symbol table info available.
#43 0x0000000000499be5 in PyEval_EvalFrameEx ()
No symbol table info available.
#44 0x00000000004a090c in PyEval_EvalCodeEx ()
No symbol table info available.
#45 0x000000000049ab45 in PyEval_EvalFrameEx ()
No symbol table info available.
#46 0x00000000004a1634 in ?? ()
No symbol table info available.
#47 0x000000000044e4a5 in PyRun_FileExFlags ()
No symbol table info available.
#48 0x000000000044ec9f in PyRun_SimpleFileExFlags ()
No symbol table info available.
#49 0x000000000044f904 in Py_Main ()
No symbol table info available.
#50 0x00007ffff7818f45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#51 0x0000000000578c4e in _start ()
No symbol table info available.
reaperhulk commented 8 years ago

Thanks for the report. @hynek, @alex do we want to do a quick release to disable the osrandom engine to prevent these types of regressions while we work on a more permanent fix? (Will also fix #542)

alex commented 8 years ago

@reaperhulk how do you imagine that PR looking, just moving the import into the relevant functions?

I won't object to such a patch, however I'm a bit concerned that it's moving the peas around our plate and hoping our parents don't notice. Specifically, anyone who uses those functions (which might be limited ATM, tbh) will continue to hit issues, and they'll just be harder to debug. Also anyone who uses both pyOpenSSL and cryptography's backend APIs will as well.

reaperhulk commented 8 years ago

I was envisioning it as moving the backend import to a separate module and calling backend.activate_builtin_random(), then importing backend from that where we use it. Since from cryptography.hazmat.backends.openssl.backend import backend is effectively a singleton (we're importing an instance) backend.activate_builtin_random() will stop the use of the osrandom engine for the lion's share of cases. If a user separately imports from cryptography.hazmat.backends.openssl.backend import Backend and instantiates Backend() then it would re-activate osrandom of course.

The disadvantage to this approach is obviously that it stops cryptography from using the superior osrandom engine, but based on the scant reports of problems before we activated it in pyOpenSSL I'm guessing we don't have a great deal of overlap in users of our hazmat/recipes and pyOpenSSL. Long term obviously we don't want to be deactivating osrandom, but we won't have the replacement ready immediately.

alex commented 8 years ago

As I've said before, I strongly object to calling activate_builtin_random; I think moving the import into the function gets us the majority of the benefit while minimizing risk.

On Wed, Oct 5, 2016 at 8:01 AM, Paul Kehrer notifications@github.com wrote:

I was envisioning it as moving the backend import to a separate module and calling backend.activate_builtin_random(), then importing backend from that where we use it. Since from cryptography.hazmat.backends.openssl.backend import backend is effectively a singleton (we're importing an instance) backend.activate_builtin_random() will stop the use of the osrandom engine for the lion's share of cases. If a user separately imports from cryptography.hazmat.backends.openssl.backend import Backend and instantiates Backend() then it would re-activate osrandom of course.

The disadvantage to this approach is obviously that it stops cryptography from using the superior osrandom engine, but based on the scant reports of problems before we activated it in pyOpenSSL I'm guessing we don't have a great deal of overlap in users of our hazmat/recipes and pyOpenSSL. Long term obviously we don't want to be deactivating osrandom, but we won't have the replacement ready immediately.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pyca/pyopenssl/issues/551#issuecomment-251654997, or mute the thread https://github.com/notifications/unsubscribe-auth/AAADBEzhWedUUMSI8kKP5t-pYedU5XWbks5qw5GWgaJpZM4KOGJ2 .

"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero GPG Key fingerprint: D1B3 ADC0 E023 8CA6

reaperhulk commented 8 years ago

Certainly moving the import into the function will fix it for most users (who are not calling those conversion routines). Maybe that's sufficient. Obviously the tradeoff is that if a user does call the offending functions at some point during their program execution, is in a multiprocess/multithread execution environment, and subsequently makes a TLS connection using pyOpenSSL they'll trigger the bug.

I don't have a strong preference since every choice has drawbacks. I will force you to express an opinion tonight @hynek ;)

JohnDoee commented 8 years ago

There's one thing I'd love to get explained, why does libtorrent, a C++ library with Python bindings, end up calling libssl that calls back into Python ? The disconnect for me is in Thread 3 - # 9 # 10 # 11

reaperhulk commented 8 years ago

There's only one instance of libssl loading into the process space and when pyOpenSSL loads up it is activating an engine that calls back into Python. So even invocations of TLS calls by libtorrent will trigger the callback.

reaperhulk commented 8 years ago

This should be resolved in 16.2.0

mcepl commented 6 years ago

Hmm, when applying a84a52ceeb8deb602515ebf76d3de3ec0e24ba56 to pyOpenSSL 16.0.0 (python-cryptography 1.3.1) on SLE 12 (SUSE enterprise Linux distribution) I get some failed tests on x86_64. See build log for more details. The fail is intermittent, when rerunning, it passes.

Any thoughts on this?