Closed Isonami closed 5 years ago
Strange, the only difference in the debug log is the 105 socket error during the connect, which refers to: "No buffer space available". Doesn't make so much sense to me. It's very low level error. The only thing that's changed in 1.0.0 and related to sockets it's the non-blocking I/O flag which was not set in the previous release. It looks like a complicated issue, I'll try to look into it a little bit more.
I get something similar, pretty sure it's the same issue. @noirello - git-bisect indeed flags 17faa203780575c09f12252c58794733df2bc766 and for me reverting just that from master fixes it. @Isonami - if you can build from source, to confirm please try checking out the latest version and then: git revert --strategy resolve 17faa203780575c09f12252c58794733df2bc766
It would also be interesting to verify which provider you are using: import bonsai; bonsai.get_tls_impl_name()
I am affected on an RHEL system, where the above fixes it. I also run this library in cygwin where I am not affected. The main differences I see are: working on cygwin - provider = openssl, libldap v2.4.42 broken on redhat - provider = MozNSS, libldap v2.4.44
Since this doesn't seem to happen against unencrypted endpoints, I'm going to guess the issue here is async connect on MozNSS. The versions reported above seem to rule out libldap version being the difference (unless I"m misunderstanding).
Unfortunately I can't remove nss (unless someone can tell me how to get ld to blacklist a specific library without root?) to verify 100%.
I will see if libldap allows set_option on LDAP_OPT_X_TLS_PACKAGE or something to hardcode openssl to see if that fixes it. Or maybe I can do this via an environment variable? Both are available on the affected system.
Barring that I'll get slapd on ssl running at home and verify from a system I control.
Testing locally I cannot get MozNSS as a provider, but even openssl fails on V1.0.0, though reverting 17faa203780575c09f12252c58794733df2bc766 still fixes it.
A bug was reported against async ssl in openldap (8957) which is cleanup on the actual bugfix for (6828). Applying the patch listed there against libldap on my system fixes the error for me. So apparently the reason cygwin worked is either the the bug in openldap was introduced sometime after v2.4.42, or (more likely) due to cygwin not experiencing it because it's so incredibly slow on my system.
I'm not sure how long it takes for openldap to issue releases (patch was only applied 1/31/2019), and then for that to make it out to various distributions. What exactly does async connections get bonsai? I assume it's performance related; not sure if you're interested in accepting a patch to workaround an already patched upstream bug to make the behavior in 17faa203780575c09f12252c58794733df2bc766 disable-able at run-time (maybe a module level option like bonsai.set_async_connections(False))?
@tck42 I've build without 17faa20 and it fixed it for me too. Hope they'll release this patch for openldap soon.
@tck42 thanks to looking into it. Your cygwin isn't affected because the settings only available from libldap 2.4.44. With lower lib version it's disabled with a macro during build.
I made some tests before releasing 1.0.0 with a few different Linux distros and OpenLDAP versions to find out which versions are capable using this non-blocking socket option without error. Then it showed that from 2.4.44 was good to go. See Issue #21 for some details.
Adding runtime flag option would've been a smart move apparently.
Could you clarify for me by running bonsai.get_tls_imp_name()
and bonsai.get_vendor_info()
which TLS implementations and OpenLDAP versions seem to be affected exactly?
Sorry for the delay. Thanks for clearing up the reason my cygwin environment is unaffected. The two systems I see the issues on are the same:
bonsai.get_tls_impl_name() 'MozNSS' bonsai.get_vendor_info() ('OpenLDAP', 20440)
Weird, based on the vendor info that you sent, it shouldn't be affected either. To allow the settings during compiling the version number should be larger than 20443.
Sorry - terminal multiplexer confusion on my part. MozNSS/OpenLDAP 20444 is reported on the affected host (just verified). I have another host (still using 0.9.1 there) that's supposed to be the same version but apparently is behind on patching. For a second I somehow thought I had a version difference between the dev and binary packages or some rogue openldap install.
Also my home system (Arch, affected by the issue):
bonsai.get_vendor_info() ('OpenLDAP', 20447) bonsai.get_tls_impl_name() 'OpenSSL'
My systems which are affected:
bonsai.get_vendor_info()
('OpenLDAP', 20446)
bonsai.get_tls_impl_name()
'GnuTLS'
bonsai.get_vendor_info()
('OpenLDAP', 20444)
bonsai.get_tls_impl_name()
'OpenSSL'```
Is there any chance that a bug fix release can be done that simply reverts 17faa20 ?
Thanks to @tck42, there's a module function currently on dev branch that can turn off the async connection process.
There's a few things I'd like to do before a new release, I'll assess my time on the weekend how many of them can be done in a short period of time. I'd like to make a new release at the end of the month.
The new release is out, including the bonsai.set_connect_async
function, that can disable the non-blocking socket settings.
I just encountered this too, thanks for all the info posted here. I wonder if this should be in the docs as it had me confused for quite some time 😄
@noirello may I have a rookie question related to this topic? I don't want to start a new issue. I am running an application on Buster Debian (python:3.8-slim-buster image, precisely) with libldap2-dev that is OpenLDAP version 2.4.47
>>> bonsai.get_vendor_info()
('OpenLDAP', 20447)
>>> bonsai.get_tls_impl_name()
'GnuTLS'
I hit the issue described here and I need to use bonsai.set_connect_async(False)
. Are the async capabilities of bonsai diminished? Does this mean that the connection with LDAP is not truly asynchronous? Thanks.
Eventually, I will probably have to juggle with other distros or build OpenLDAP myself but for now I need bonsai.set_connect_async(False)
to get me going in the first iteration.
Unfortunately, it does. When bonsai.set_connect_async(False)
is set, then the socket using for the connection doesn't have the non-blocking socket flag set. Therefore building up the connection with the LDAP server will definitely cause blocking and the underlying event loop won't be able to switch to other coroutines.
Things get better after you have a built connection, because running an LDAP operation is asynchronous on API level (e.g. you can get a coroutine switch during a LDAP search).
It's quite unfortunate, that this issue reappears time to time. :( But it seems that the integration between OpenLDAP and the different TLS libraries isn't seamless. (To the best of my knowledge, the TLS libraries being unaware of the non-blocking socket causes this problem.)
I've update bonsai from 0.9.1 to 1.0.0 and it has started to get connection error "bonsai.errors.ConnectionError: Can't contact LDAP server. (0xFFFF [-1])" every time, no matter if I set credentials / certificate options or not. Debug output:
And debug output from 0.9.1 version:
ldap lib versions: Ubuntu 18.10: libldap-2.4-2/cosmic-updates,now 2.4.46+dfsg-5ubuntu1.1 amd64 Centos 7.5: openldap-2.4.44-15.el7_5.x86_64