instaclustr / cassandra-ldap

LDAP Authenticator for Apache Cassandra
Apache License 2.0
24 stars 16 forks source link

Multiple ldap servers failover problem #3

Closed viljoviitanen closed 3 years ago

viljoviitanen commented 6 years ago

Hello!

While testing the ldap authenticator for Cassandra, I found the following issue.

In case multiple ldap servers are defined, in certain cases the second one is never tried, and authentication fails.

My test setup is this:

ldap_uri: ldap://localhost:2389/dc=example,dc=org ldap://localhost:1389/dc=example,dc=org
service_dn: cn=admin,dc=example,dc=org
service_password: admin
anonymous_access: false
cache_hashed_password: true
docker run --rm -d -p 1389:389 --name ldap1 osixia/openldap
docker run --rm -d -p 2389:389 --name ldap2 osixia/openldap

(by default, the osixia ldap creates cn=admin,dc=example,dc=org which has password admin)

Then try access cassandra like this:

$ echo 'use system;'|time bin/cqlsh -u admin -p admin localhost
0.29user 0.04system 0:01.07elapsed 32%CPU (0avgtext+0avgdata 22956maxresident)k
0inputs+0outputs (0major+13267minor)pagefaults 0swaps

(success, authentication is done against the first ldap server in list)

Kill first server in list, but have the tcp connection fail immediately:

$ docker kill ldap2

$ echo 'use system;'|time bin/cqlsh -u admin -p admin localhost
0.31user 0.03system 0:01.03elapsed 33%CPU (0avgtext+0avgdata 22768maxresident)k
11776inputs+0outputs (24major+13260minor)pagefaults 0swaps

(success, authentication is done against second server in list)

Deny any network connectivity to first server in list (simulating physical server or network issues):

$ sudo iptables -I INPUT 1 -s localhost -p tcp --destination-port 2389 -j DROP

$ echo 'use system;'|time bin/cqlsh -u admin -p admin localhost
Connection error: ('Unable to connect to any servers', {'127.0.0.1': OperationTimedOut('errors=Timed out creating connection (5 seconds), last_host=None',)})
Command exited with non-zero status 1
0.25user 0.02system 0:05.44elapsed 5%CPU (0avgtext+0avgdata 22448maxresident)k
0inputs+0outputs (0major+12891minor)pagefaults 0swaps

(nasty error, which I think should not happen)

Restore connectivity

$ sudo iptables -D INPUT -s localhost -p tcp --destination-port 2389 -j DROP ..and authentication works again.

Is this something expected? Or just a problem with my test setup?

kgreav commented 5 years ago

Honestly didn't try testing with multiple servers. It should work. Wouldn't be surprised if the timeout is breaking it. It might be that the service user is still "connected" to the first server because the connection doesn't get closed when you create a new firewall rule that only drops, and thus it doesn't start using the second server.

Anyway, probably won't have time to troubleshoot/patch for a bit, but I can review a PR if you work it out. Otherwise probably another month or so before I'll get to it.