noirello / bonsai

Simple Python 3 module for LDAP, using libldap2 and winldap C libraries.
MIT License
117 stars 33 forks source link

"Can't contact LDAP server" on 1.0.0 version #25

Closed Isonami closed 5 years ago

Isonami commented 5 years ago

I've update bonsai from 0.9.1 to 1.0.0 and it has started to get connection error "bonsai.errors.ConnectionError: Can't contact LDAP server. (0xFFFF [-1])" every time, no matter if I set credentials / certificate options or not. Debug output:

client = bonsai.LDAPClient("ldaps://my.ldap:636")
client.set_credentials('simple', user='ou=condn', password='password')
conn = client.connect()
DBG: ldapconnection_new [self:0x7fd9c87fdd08]
DBG: ldapconnection_init (self:0x7fd9c87fdd08)
DBG: ldapconnection_open (self:0x7fd9c87fdd08)
DBG: connecting (self:0x7fd9c87fdd08)
DBG: create_conn_info (mech:SIMPLE, sock:-1, creds:0x7fd9af8f8240)
DBG: ldapconnectiter_new [self:0x7fd9af8f8f18]
DBG: create_init_thread_data (client:0x7fd9ae68d940, sock:-1)
DBG: create_init_thread (ld:0x29145e0, info:0x2913460, thread:0)
DBG: ldapconnection_result (self:0x7fd9c87fdd08, args:0x7fd9ae6940c8, kwds:(nil))[msgid:-1]
DBG: LDAPConnection_Result (self:0x7fd9c87fdd08, msgid:-1, millisec:-1)
DBG: LDAPConnectIter_Next (self:0x7fd9af8f8f18, timeout:-1) [tls:0, state:0]
DBG: _ldap_finish_init_thread (async:0, thread:140573321987840, timeout:-1, misc:0x29145e0)
DBG: _pthread_mutex_timedlock
DBG: LDAPConnectIter_Next (self:0x7fd9af8f8f18, timeout:-1) [tls:0, state:0]
DBG: _ldap_finish_init_thread (async:0, thread:140573321987840, timeout:-1, misc:0x29145e0)
DBG: _pthread_mutex_timedlock
ldap_create
ldap_url_parse_ext(ldaps://my.ldap:636)
ldap_sasl_bind
ldap_send_initial_request
ldap_new_connection 1 1 0
ldap_int_open_connection
ldap_connect_to_host: TCP my.ldap:636
ldap_new_socket: 19
ldap_prepare_socket: 19
ldap_connect_to_host: Trying 10.0.0.1:636
ldap_pvt_connect: fd: 19 tm: 0 async: -1
ldap_ndelay_on: 19
attempting to connect: 
connect errno: 115
ldap_open_defconn: successful
ldap_send_server_request
ldap_msgfree
ldap_result ld 0x7fd9b40076d0 msgid 1
wait4msg ld 0x7fd9b40076d0 msgid 1 (infinite timeout)
wait4msg continue ld 0x7fd9b40076d0 msgid 1 all 1
** ld 0x7fd9b40076d0 Connections:
* host: my.ldap  port: 636  (default)
  refcnt: 2  status: Connected
  last used: Fri Jan 25 16:11:58 2019
** ld 0x7fd9b40076d0 Outstanding Requests:
 * msgid 1,  origid 1, status Writing
   outstanding referrals 0, parent count 0
  ld 0x7fd9b40076d0 request count 1 (abandoned 0)
** ld 0x7fd9b40076d0 Response Queue:
   Empty
  ld 0x7fd9b40076d0 response count 0
ldap_chkResponseList ld 0x7fd9b40076d0 msgid 1 all 1
ldap_chkResponseList returns ld 0x7fd9b40076d0 NULL
ldap_int_select
wait4msg continue ld 0x7fd9b40076d0 msgid 1 all 1
** ld 0x7fd9b40076d0 Connections:
* host: my.ldap  port: 636  (default)
  refcnt: 2  status: Connected
  last used: Fri Jan 25 16:11:58 2019
** ld 0x7fd9b40076d0 Outstanding Requests:
 * msgid 1,  origid 1, status InProgress
   outstanding referrals 0, parent count 0
  ld 0x7fd9b40076d0 request count 1 (abandoned 0)
** ld 0x7fd9b40076d0 Response Queue:
   Empty
  ld 0x7fd9b40076d0 response count 0
ldap_chkResponseList ld 0x7fd9b40076d0 msgid 1 all 1
ldap_chkResponseList returns ld 0x7fd9b40076d0 NULL
ldap_int_select
DBG: ldap_init_thread_func (params:0x29145e0)
DBG: set_cert_policy (ld:0x7fd9b40076d0, cert_policy:4)
DBG: ldap_init_thread_func [retval:0]
DBG: LDAPConnectIter_Next (self:0x7fd9af8f8f18, timeout:-1) [tls:0, state:0]
DBG: _ldap_finish_init_thread (async:0, thread:140573321987840, timeout:-1, misc:0x29145e0)
DBG: _pthread_mutex_timedlock
DBG: set_certificates (self:0x7fd9af8f8f18)
DBG: binding [state:3]
DBG: _ldap_bind (ld:0x7fd9b40076d0, info:0x2913460, ppolicy:0, result:(nil), msgid:0)
DBG: LDAPConnectIter_Next (self:0x7fd9af8f8f18, timeout:-1) [tls:0, state:4]
DBG: binding [state:4]
DBG: ldapconnectiter_dealloc (self:0x7fd9af8f8f18)
DBG: dealloc_conn_info (info:0x2913460)
read1msg: ld 0x7fd9b40076d0 msgid 1 all 1
ldap_msgfree
ldap_err2string
Traceback (most recent call last):
  File "/home/pkirillov/.virtualenvs/amp/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-29-a00ee1f64c39>", line 1, in <module>
    conn = client.connect()
  File "/home/pkirillov/.virtualenvs/amp/lib/python3.6/site-packages/bonsai/ldapclient.py", line 577, in connect
    return LDAPConnection(self).open(timeout)
  File "/home/pkirillov/.virtualenvs/amp/lib/python3.6/site-packages/bonsai/ldapconnection.py", line 292, in open
    return super().open(timeout)
  File "/home/pkirillov/.virtualenvs/amp/lib/python3.6/site-packages/bonsai/ldapconnection.py", line 53, in open
    return self._evaluate(super().open(), timeout)
  File "/home/pkirillov/.virtualenvs/amp/lib/python3.6/site-packages/bonsai/ldapconnection.py", line 241, in _evaluate
    return self.get_result(msg_id, timeout)
bonsai.errors.ConnectionError: Can't contact LDAP server. (0xFFFF [-1])

And debug output from 0.9.1 version:

client = bonsai.LDAPClient("ldaps://my.ldap:636")
client.set_credentials('simple', ('ou=conndn', 'password'))
conn = client.connect()
DBG: ldapconnection_new [self:0x7f9e1a63bd08]
DBG: ldapconnection_init (self:0x7f9e1a63bd08)
DBG: ldapconnection_open (self:0x7f9e1a63bd08)
DBG: connecting (self:0x7f9e1a63bd08)
DBG: create_conn_info (mech:SIMPLE, sock:-1, creds:0x7f9e084eed48)
DBG: ldapconnectiter_new [self:0x7f9e0973c9c0]
DBG: create_init_thread_data (client:0x7f9e09781080, sock:-1)
DBG: create_init_thread (ld:0x2323cf0, info:0x2014a00, thread:0)
DBG: ldapconnection_result (self:0x7f9e1a63bd08, args:0x7f9e084f08c8, kwds:(nil))[msgid:-1]
DBG: LDAPConnection_Result (self:0x7f9e1a63bd08, msgid:-1, millisec:-1)
DBG: LDAPConnectIter_Next (self:0x7f9e0973c9c0, timeout:-1) [tls:0, state:0]
DBG: _ldap_finish_init_thread (async:0, thread:140316997838592, timeout:-1, misc:0x2323cf0)
DBG: ldap_init_thread_func (params:0x2323cf0)
DBG: set_cert_policy (ld:0x7f9e0400fcd0, cert_policy:4)
DBG: ldap_init_thread_func [retval:0]
DBG: _pthread_mutex_timedlock
DBG: set_certificates (self:0x7f9e0973c9c0)
DBG: binding [state:3]
DBG: _ldap_bind (ld:0x7f9e0400fcd0, info:0x2014a00, ppolicy:0, result:(nil), msgid:0)
ldap_create
ldap_url_parse_ext(ldaps://my.ldap:636)
ldap_sasl_bind
ldap_send_initial_request
ldap_new_connection 1 1 0
ldap_int_open_connection
ldap_connect_to_host: TCP my.ldap:636
ldap_new_socket: 10
ldap_prepare_socket: 10
ldap_connect_to_host: Trying 10.0.0.1:636
ldap_pvt_connect: fd: 10 tm: -1 async: 0
attempting to connect: 
connect success
ldap_open_defconn: successful
ldap_send_server_request
ldap_msgfree
ldap_result ld 0x7f9e0400fcd0 msgid 1
wait4msg ld 0x7f9e0400fcd0 msgid 1 (infinite timeout)
wait4msg continue ld 0x7f9e0400fcd0 msgid 1 all 1
** ld 0x7f9e0400fcd0 Connections:
* host: my.ldap  port: 636  (default)
  refcnt: 2  status: Connected
  last used: Fri Jan 25 16:30:09 2019
** ld 0x7f9e0400fcd0 Outstanding Requests:
 * msgid 1,  origid 1, status InProgress
   outstanding referrals 0, parent count 0
  ld 0x7f9e0400fcd0 request count 1 (abandoned 0)
** ld 0x7f9e0400fcd0 Response Queue:
   Empty
  ld 0x7f9e0400fcd0 response count 0
ldap_chkResponseList ld 0x7f9e0400fcd0 msgid 1 all 1
ldap_chkResponseList returns ld 0x7f9e0400fcd0 NULL
ldap_int_select
read1msg: ld 0x7f9e0400fcd0 msgid 1 all 1
read1msg: ld 0x7f9e0400fcd0 msgid 1 message type bind
read1msg: ld 0x7f9e0400fcd0 0 new referrals
read1msg:  mark request completed, ld 0x7f9e0400fcd0 msgid 1
request done: ld 0x7f9e0400fcd0 msgid 1
res_errno: 0, res_error: <>, res_matched: <>
ldap_free_request (origid 1, msgid 1)
ldap_parse_result
ldap_msgfree
DBG: LDAPConnectIter_Next (self:0x7f9e0973c9c0, timeout:-1) [tls:0, state:4]
DBG: binding [state:4]
DBG: ldapconnectiter_dealloc (self:0x7f9e0973c9c0)
DBG: dealloc_conn_info (info:0x2014a00)
DBG: ldapconnection_dealloc (self:0x7f9e18d570a8)

ldap lib versions: Ubuntu 18.10: libldap-2.4-2/cosmic-updates,now 2.4.46+dfsg-5ubuntu1.1 amd64 Centos 7.5: openldap-2.4.44-15.el7_5.x86_64

noirello commented 5 years ago

Strange, the only difference in the debug log is the 105 socket error during the connect, which refers to: "No buffer space available". Doesn't make so much sense to me. It's very low level error. The only thing that's changed in 1.0.0 and related to sockets it's the non-blocking I/O flag which was not set in the previous release. It looks like a complicated issue, I'll try to look into it a little bit more.

tck42 commented 5 years ago

I get something similar, pretty sure it's the same issue. @noirello - git-bisect indeed flags 17faa203780575c09f12252c58794733df2bc766 and for me reverting just that from master fixes it. @Isonami - if you can build from source, to confirm please try checking out the latest version and then: git revert --strategy resolve 17faa203780575c09f12252c58794733df2bc766

It would also be interesting to verify which provider you are using: import bonsai; bonsai.get_tls_impl_name()

I am affected on an RHEL system, where the above fixes it. I also run this library in cygwin where I am not affected. The main differences I see are: working on cygwin - provider = openssl, libldap v2.4.42 broken on redhat - provider = MozNSS, libldap v2.4.44

Since this doesn't seem to happen against unencrypted endpoints, I'm going to guess the issue here is async connect on MozNSS. The versions reported above seem to rule out libldap version being the difference (unless I"m misunderstanding).

Unfortunately I can't remove nss (unless someone can tell me how to get ld to blacklist a specific library without root?) to verify 100%.

I will see if libldap allows set_option on LDAP_OPT_X_TLS_PACKAGE or something to hardcode openssl to see if that fixes it. Or maybe I can do this via an environment variable? Both are available on the affected system.

Barring that I'll get slapd on ssl running at home and verify from a system I control.

tck42 commented 5 years ago

Testing locally I cannot get MozNSS as a provider, but even openssl fails on V1.0.0, though reverting 17faa203780575c09f12252c58794733df2bc766 still fixes it.

tck42 commented 5 years ago

A bug was reported against async ssl in openldap (8957) which is cleanup on the actual bugfix for (6828). Applying the patch listed there against libldap on my system fixes the error for me. So apparently the reason cygwin worked is either the the bug in openldap was introduced sometime after v2.4.42, or (more likely) due to cygwin not experiencing it because it's so incredibly slow on my system.

I'm not sure how long it takes for openldap to issue releases (patch was only applied 1/31/2019), and then for that to make it out to various distributions. What exactly does async connections get bonsai? I assume it's performance related; not sure if you're interested in accepting a patch to workaround an already patched upstream bug to make the behavior in 17faa203780575c09f12252c58794733df2bc766 disable-able at run-time (maybe a module level option like bonsai.set_async_connections(False))?

Isonami commented 5 years ago

@tck42 I've build without 17faa20 and it fixed it for me too. Hope they'll release this patch for openldap soon.

noirello commented 5 years ago

@tck42 thanks to looking into it. Your cygwin isn't affected because the settings only available from libldap 2.4.44. With lower lib version it's disabled with a macro during build.

I made some tests before releasing 1.0.0 with a few different Linux distros and OpenLDAP versions to find out which versions are capable using this non-blocking socket option without error. Then it showed that from 2.4.44 was good to go. See Issue #21 for some details.

Adding runtime flag option would've been a smart move apparently.

Could you clarify for me by running bonsai.get_tls_imp_name() and bonsai.get_vendor_info() which TLS implementations and OpenLDAP versions seem to be affected exactly?

tck42 commented 5 years ago

Sorry for the delay. Thanks for clearing up the reason my cygwin environment is unaffected. The two systems I see the issues on are the same:

bonsai.get_tls_impl_name() 'MozNSS' bonsai.get_vendor_info() ('OpenLDAP', 20440)

noirello commented 5 years ago

Weird, based on the vendor info that you sent, it shouldn't be affected either. To allow the settings during compiling the version number should be larger than 20443.

tck42 commented 5 years ago

Sorry - terminal multiplexer confusion on my part. MozNSS/OpenLDAP 20444 is reported on the affected host (just verified). I have another host (still using 0.9.1 there) that's supposed to be the same version but apparently is behind on patching. For a second I somehow thought I had a version difference between the dev and binary packages or some rogue openldap install.

Also my home system (Arch, affected by the issue):

bonsai.get_vendor_info() ('OpenLDAP', 20447) bonsai.get_tls_impl_name() 'OpenSSL'

Isonami commented 5 years ago

My systems which are affected:


bonsai.get_vendor_info()
('OpenLDAP', 20446)
bonsai.get_tls_impl_name()
'GnuTLS'

bonsai.get_vendor_info()
('OpenLDAP', 20444)
bonsai.get_tls_impl_name()
'OpenSSL'```
garyvdm commented 5 years ago

Is there any chance that a bug fix release can be done that simply reverts 17faa20 ?

noirello commented 5 years ago

Thanks to @tck42, there's a module function currently on dev branch that can turn off the async connection process.

There's a few things I'd like to do before a new release, I'll assess my time on the weekend how many of them can be done in a short period of time. I'd like to make a new release at the end of the month.

noirello commented 5 years ago

The new release is out, including the bonsai.set_connect_async function, that can disable the non-blocking socket settings.

fgimian commented 4 years ago

I just encountered this too, thanks for all the info posted here. I wonder if this should be in the docs as it had me confused for quite some time 😄

remy-sl commented 4 years ago

@noirello may I have a rookie question related to this topic? I don't want to start a new issue. I am running an application on Buster Debian (python:3.8-slim-buster image, precisely) with libldap2-dev that is OpenLDAP version 2.4.47

>>> bonsai.get_vendor_info()
('OpenLDAP', 20447)
>>> bonsai.get_tls_impl_name()
'GnuTLS'

I hit the issue described here and I need to use bonsai.set_connect_async(False). Are the async capabilities of bonsai diminished? Does this mean that the connection with LDAP is not truly asynchronous? Thanks.

Eventually, I will probably have to juggle with other distros or build OpenLDAP myself but for now I need bonsai.set_connect_async(False) to get me going in the first iteration.

noirello commented 4 years ago

Unfortunately, it does. When bonsai.set_connect_async(False) is set, then the socket using for the connection doesn't have the non-blocking socket flag set. Therefore building up the connection with the LDAP server will definitely cause blocking and the underlying event loop won't be able to switch to other coroutines.

Things get better after you have a built connection, because running an LDAP operation is asynchronous on API level (e.g. you can get a coroutine switch during a LDAP search).

It's quite unfortunate, that this issue reappears time to time. :( But it seems that the integration between OpenLDAP and the different TLS libraries isn't seamless. (To the best of my knowledge, the TLS libraries being unaware of the non-blocking socket causes this problem.)