inejge / ldap3

A pure-Rust LDAP library using the Tokio stack
Apache License 2.0
220 stars 38 forks source link

unbind fails when using ldaps or starttls? #66

Closed Zerowalker closed 3 years ago

Zerowalker commented 3 years ago

I don't know if it's a bug or i am doing something wrong, but whenever i try to use ldaps or starttls it fails when i unbind.

I seem to bind and even search just fine, but unbind will throw an error: ResultRecv { source: RecvError(()) } It seems to somtimes suceed once or twice for some reason but it's rare.

Is someone able to test this?

inejge commented 3 years ago

The error looks like what you'd get when attempting an operation on an already closed connection.

Can you provide a program that can reproduce that behavior against the test service in the distribution (configured in the data directory)?

Zerowalker commented 3 years ago

hmm, perhaps it's a windows thing, as i am doing this against a windows dc. will try setup the ldap test server though to confirm.

inejge commented 3 years ago

The client is also on Windows then, I suppose? In any case, post a minimal program (with anonymized DNs and bind parameters, of course) which demonstrates the problematic behavior.

Zerowalker commented 3 years ago

yes, i couldn't get the ldap server test to work, tried on both debian and ubuntu so sadly i can't provide anything there. But here's an example code which fails upon unbind (as long as the provided parameters are correct of course) https://github.com/Zerowalker/ldap3test

inejge commented 3 years ago

Briefly, I can't reproduce with a Linux client connecting to AD (Windows Server 2016, I think.) I get the output

binding
LdapResult { rc: 0, matched: "", text: "", refs: [], ctrls: [] }
ldap ended normally

I don't have a Windows machine with Rust handy, and probably won't in the next couple of days, so I can't test with a Windows client.

A couple of notes. First, for StartTLS, the URL should be ldap:// and not ldaps:// since the initial connection is in the clear. This crate doesn't make the distinction, which some may regard as a bug.

Second, your Cargo.toml, at least on Linux, doesn't force the use of rustls, since the default option is tls-native, which uses OpenSSL. To actually use rustls, the following suffices in the [dependencies] section:

ldap3 = { version = "0.9.2", default-features = false, features = ["tls-rustls"] }
tokio = { version = "1.1.1", features = ["full"] }

You don't need rustls and async-trait as dependencies in that case.

Zerowalker commented 3 years ago

well changing the Cargo.toml seems to solve it:S

this works, but using rusttls separately or not adding it at all causes the ldap error. ldap3 = { version = "0.9.2", default-features = false, features = ["tls-rustls"] } tokio = { version = "1.1.1", features = ["full"] }

confuses me a bit, shouldn't it either behave the same or fail to compile or when you try to make the connection otherwise?

And yeah forgot to remove async-trait, threw it together quickly so it was messy, not that my code isn't to begin with haha;P

inejge commented 3 years ago

Strange, but at least you have a workaround. I can't say more until I test it myself, let the issue remain open in the meantime.

inejge commented 3 years ago

Can't reproduce with a Windows 7 client in any TLS configuration against an OpenLDAP server. I'll close this for now; if you have an updated reliable reproduction case we can reopen it.

Zerowalker commented 3 years ago

How do you setup the OpenLDAP server, would like to try it as well, to see if it behaves differently compared to a windows domain controller. Cause i can't make out why i have these problems as you clearly don't, and this seems to be the only thing that's clearly different, at least that i can think of.

inejge commented 3 years ago

This is for Ubuntu 16.04, other versions should be similar.

Install slapd. Go to /etc/ldap and run

openssl req -x509 -nodes -newkey rsa:2048 -keyout key.pem -out cert.pem -subj '/CN=localhost' -days 365

This will generate cert.pem and key.pem in that directory. Next, create /etc/ldap/slapd.conf with the following contents:

include /etc/ldap/schema/core.schema

argsfile /var/run/ldap/slapd.args
pidfile /var/run/ldap/slapd.pid

moduleload back_mdb.so

TLSCACertificateFile /etc/ldap/cert.pem
TLSCertificateFile /etc/ldap/cert.pem
TLSCertificateKeyFile /etc/ldap/key.pem
TLSProtocolMin 3.3

database mdb
suffix "o=local"
maxsize 512000000
directory /var/lib/ldap
rootdn "cn=manager,o=local"
rootpw secret
index objectClass eq

In /etc/default/slapd, place the following two keys (any other definitions of these two keys should be commented out):

SLAPD_CONF=/etc/ldap/slapd.conf
SLAPD_SERVICES="ldapi:/// ldap:/// ldaps:///"

Start the server with systemctl start slapd. I had to run aa-complain /usr/sbin/slapd to pacify AppArmor beforehand, not sure why.

Bind with DN cn=manager,o=local and password secret.

inejge commented 3 years ago

I managed to arrange the test with Win7 client -> Server 2016 + AD. Couldn't reproduce the problem. Did you try to do a cargo clean and rebuild the problematic program? I believe that a corrupted build would manifest with less subtle errors, but it doesn't hurt to try.

Zerowalker commented 3 years ago

haven't gotten to try the openldap yet, thanks for the instruction btw! What was the cargo.toml like in your windows test? just to make sure i set it up the same, i will also try cargo clean.

Really grateful for your interest in trying to find the issue, i know how frustrating it can be when it's not reproducable on your end. It seems to really point towards me doing something wrong or having something corrupted as you say.

inejge commented 3 years ago

What was the cargo.toml like in your windows test?

Identical to your test program, without trimming the unnecessary dependencies:

rustls = "0.19.0"
ldap3 = "0.9.2"
tokio = { version = "1.1.1", features = ["full"] }
async-trait = "0.1.42"

As noted before, this actually uses native-tls and friends as the TLS backend.

Zerowalker commented 3 years ago

It works for me as well. But specifying rusttls spearately or in the "features" is the same for ldap3?

So when not specifying rustls at all, is it supposed to work with ldaps/starttls?

EDIT:

False report, it worked once then it didn't again. No clue why this happens, but i assume you can do the test multiple times in a row and it always succeeds?

inejge commented 3 years ago

First of all...

i assume you can do the test multiple times in a row and it always succeeds?

Yes.

Do you happen to have some kind of network translation box between your machine and the server, or an application firewall installed on the machine? Can you disable them? Can you run the program on the server itself?

But specifying rusttls spearately or in the "features" is the same for ldap3?

It isn't. If you have rustls in the dependencies without configuring it in the features, ldap3 won't use it as the TLS backend, because it uses native-tls as the default.

Zerowalker commented 3 years ago

It's on a virtual machine, so there can be some weird stuff, i will try running it directly on the domain controller to make sure this doesn't mess anything up.

It isn't. If you have rustls in the dependencies without configuring it in the features, ldap3 won't use it as the TLS backend, because it uses native-tls as the default.

So if i have rusttls in the cargo or not doesn't really do anything. but specifying features = ["tls-rustls"] (wether or not rusttls is in the cargo) makes ldap3 use it, do i get that right?

cause in that case, wouldn't it mean that native-tls vs rusttls is the cause to some degree at least? as a cargo like this:

[dependencies]
ldap3 = { version = "0.9.2", default-features = false, features = ["tls-rustls"] }
tokio = { version = "1.2.0", features = ["full"] }

Seems to work fine.

But this doesn't:

[dependencies]
ldap3 = { version = "0.9.2" }
tokio = { version = "1.2.0", features = ["full"] }

I updated my test code to simplify things.

https://github.com/Zerowalker/ldap3test

If i run that, all tests succeeds. However if i run the native-tls branch: https://github.com/Zerowalker/ldap3test/tree/native-tls (which is just ldap3 without specifying tls-rustls) it fails at the first encrypted unbind:

Testing ConnType: Ldap
-----Test [Ldap]: #0-----
url: ldap://192.168.1.5, StartTls: false
-----Test Done-----
-----Test [Ldap]: #1-----
url: ldap://192.168.1.5, StartTls: false
-----Test Done-----
-----Test [Ldap]: #2-----
url: ldap://192.168.1.5, StartTls: false
-----Test Done-----
-----Test [Ldap]: #3-----
url: ldap://192.168.1.5, StartTls: false
-----Test Done-----
-----Test [Ldap]: #4-----
url: ldap://192.168.1.5, StartTls: false
-----Test Done-----
-----Test [StartTls]: #0-----
url: ldap://192.168.1.5, StartTls: true
unbind failed

Will run it on the domain controller directly and report back

EDIT:

Seems to act the same.

Testing ConnType: Ldap
-----Test [Ldap]: #0-----
url: ldap://127.0.0.1, StartTls: false
-----Test Done-----
-----Test [Ldap]: #1-----
url: ldap://127.0.0.1, StartTls: false
-----Test Done-----
-----Test [Ldap]: #2-----
url: ldap://127.0.0.1, StartTls: false
-----Test Done-----
-----Test [Ldap]: #3-----
url: ldap://127.0.0.1, StartTls: false
-----Test Done-----
-----Test [Ldap]: #4-----
url: ldap://127.0.0.1, StartTls: false
-----Test Done-----
-----Test [StartTls]: #0-----
url: ldap://127.0.0.1, StartTls: true
unbind failed
inejge commented 3 years ago

So it seems that you have problems with the native-tls build. What happens if you skip the ConnType::StartTls tests—do the ConnType::Ldaps tests work?

Zerowalker commented 3 years ago

Indeed, and read about the difference yesterday, something about one being platform agnostic and the other being the opposite. Very confusing in my view as i just expect it to work the same as long as you set the same type and protocol etc (as in TLS version 3 or whatever, not sure how the encryption stuff works;P).

I will check ldaps, i will run all of them and not just abort upon failure and post the logs, will update the git to keep it consistent.

Zerowalker commented 3 years ago

Here is the native tls results:

[Ldap]: #0 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #1 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #2 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #3 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #4 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #5 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #6 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #7 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #8 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #9 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[StartTls]: #0 - [ FAILED  ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #1 - [ FAILED  ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #2 - [ FAILED  ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #3 - [ FAILED  ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #4 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #5 - [ FAILED  ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #6 - [ FAILED  ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #7 - [ FAILED  ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #8 - [ FAILED  ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #9 - [ FAILED  ] - [url: ldap://192.168.1.5, StartTls: true]
[Ldaps]: #0 - [SUCCEEDED] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #1 - [ FAILED  ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #2 - [SUCCEEDED] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #3 - [ FAILED  ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #4 - [ FAILED  ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #5 - [ FAILED  ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #6 - [ FAILED  ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #7 - [SUCCEEDED] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #8 - [ FAILED  ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #9 - [ FAILED  ] - [url: ldaps://192.168.1.5, StartTls: false]

It's random on both StartTLS and LDAPS if if suceeds or fails. my assumption is that it doesn't close the connection correctly or something, so it can succeed, but then the next time it will fail until the server closes it, and then you succeed again. But it doesn't seem to be deterministic.

inejge commented 3 years ago

Having a network packet and/or system call trace would probably throw some light on this behavior, but decrypting TLS in packet captures with Wireshark is not easy, and I'm not very well versed in Windows syscall tracing. Have you tried your client against a Linux-based OpenLDAP server?

Zerowalker commented 3 years ago

Been checking it with wireshark and asking around to make sense of it. They do behave a little different during the handshake part, then they are the same except the one that fails never get the "encrypted alert" package from the server.

Sadly no, i got caught up and never set it up, will try it as you posted some fine instructions before.

Zerowalker commented 3 years ago

I tried against OpenLDAP on Debian and using native-tls ldap3 on Windows and it seems to succeed in all tests (LDAP,STARTTLS,LDAPS).

inejge commented 3 years ago

Since it works with Debian, my best guess is that there's something in the AD server/service setup that triggers the TLS connection instability when using the native-tls (SChannel) backend on the client, and that Rustls somehow avoids it. It may be instructive to capture the Rustls connection establishment and compare it with the native-tls one, if possible.

Zerowalker commented 3 years ago

I will gladly capture it, just need to know how to do it. I know how to use wireshark, but unsure if that's sufficient for this or not?

inejge commented 3 years ago

Try to capture complete sessions, both successful and unsuccessful, with native-tls and Rustls, in separate .pcap files, and name them accordingly. I'm not sure how helpful it's all going to be without TLS decryption, but you can try. Encrypted captures will at least hide sensitive local details such as bind passwords.

Zerowalker commented 3 years ago

Here is 3 tries on LDAPS on both rusttls and native-tls. rusttls succeeded all, native-tls failed all. rust_nativetls_3_sessions.zip

inejge commented 3 years ago

In the traces, neither native-tls nor Rustls have the connection closed regularly. In both cases, the server's last packet is an RST, indicating a forced/errored closing on the server side. The difference is that Rustls invariably sends an "Encrypted Alert" as the last packet, possibly informing the server about connection shutdown. The LDAP spec says that both sides should "gracefully terminate" the session when the server receives an Unbind, so the RST is not kosher.

Since the Unbind operation is unacknowledged per the protocol spec, the client doesn't expect a reply, but returns the indication of successful socket write operation. It's possible that the server's unexpected RST flags the connection as errored despite the successfully transmitted Unbind, and that error gets propagated back.

Since the Unbind indicates that you're finished with the connection, you could simply ignore its error status and drop the connection once you're finished with it, if the problems of this type persist. I'm not sure whether I could special-case Unbind errors in the IO loop without burdening the implementation with very finicky code.

Zerowalker commented 3 years ago

hmm, so the unbind is the 90bytes packet? and then rusttls uses "Encrypted Alert" to tell it to close.

So the Unbind is successful as long as the TCP tells it that the data has been given to the other end? How does is it handled in other cases, is it expecting a certain response?

Kinda interested in how other solutions handle it, i have used it in golang i think, will see if i can make something out of it.

What do you mean "if the problems of this type persist", aren't we expecting it to be this way if both parts are doing their thing, but they are expecting different responses when closing down?

EDIT:

Here is a wireshark capture of me trying the go-ldap. it seems like it perhaps closes a bit more gracefully as it uses ACK, FIN? But the Close function in doesn't return any errors so not sure if it will just panic if it assumes it's wrong, or if it just ignores everything, will try digging the code a bit more to see if i can make sense of that part.

go-ldap-ldaps.zip

inejge commented 3 years ago

hmm, so the unbind is the 90bytes packet?

Well that's the problem, the packet is encrypted and I can only presume that it's the Unbind.

So the Unbind is successful as long as the TCP tells it that the data has been given to the other end? How does is it handled in other cases, is it expecting a certain response?

The specification (RFC 4511) says that Unbind has no response. Other operations mostly do have one.

What do you mean "if the problems of this type persist"

If the only problematic operation in your application is Unbind, you can skip it if you drop the Ldap handle. The connection is not gracefully closed if you do that, but it's closed and you shouldn't see any errors.

Zerowalker commented 3 years ago

I tried testing at work, and i think it actually always succeeded, but i can't replicate it in my test suite even though i have set it up from scratch. I might have done something wrong in the test though so i will have to redo it to really make sure, cause it doesn't make sense to me.

In the other hand i captured a connection/bind/unbind from "ldp" which seems to be an inbuilt test tool for ldap for windows domains, just to see if that gives any extra insight on anything.

The test was done by connecting to the server with 636 and SSL checked. Then a simple bind with "user@domain" and it's password, then a disconnect (which seems to call unbind) base on the logs: 0x0 = ldap_unbind(ld);

Here is the capture for what it's worth: ldp_ssl_636_connect_bind_unbind.zip

It looks quite similar i think? so it might just be that unbind is successful if it sends and the server just closes the connection without any response like the specifications state? How does OpenLDAP handle it then, does it respond and ldap3 expects this response?

inejge commented 3 years ago

OpenLDAP sends a FIN,ACK which is a graceful connection close.

I made a tweak to the closing sequence to attempt a true graceful close on the client side; maybe it helps. It is in the shutdown branch of the GH repo, so you can use it in your code by specifying:

ldap3 = { git = "https://github.com/inejge/ldap3", branch = "shutdown" }

Try it with your failing program and see how it goes.

Zerowalker commented 3 years ago

Sadly it didn't seem to help, still fails on unbind and the server doesn't send a FIN,ACK. But the client does which is your change i presume, but it happens after the server has sent RST, ACK though.

inejge commented 3 years ago

Bummer. For laughs, I changed Unbind to ignore errors and plough through with shutting down and closing the stream. It's in the shutdown branch again. Do a cargo update --package ldap3 (you should get commit 0b3a8886), and recompile. Does it work now?

Zerowalker commented 3 years ago

It succeeds as expected, sorry for late answer.

inejge commented 3 years ago

Well that was easier than I suspected, and I think I'm going to keep it: anything after the successful write-out of an Unbind is just politeness on the client part, and it doesn't really matter if it fails. I'll update this message when I merge the change in master.

Zerowalker commented 3 years ago

Well just to make things messier, i did some extra checks at work. And there it works on native-tls, but not on rustls, or well one dc failed and the other succeeded. i didn't check wireshark, but it really odd.. i have tried researching and only thing i can find is that windows handles the certification handshaking oddly with ldap or something. And it also doesn't use the subject name which seems to cause issues, and it used them before (not sure which version). https://docs.microsoft.com/en-us/troubleshoot/windows-server/identity/ldap-over-ssl-connection-issues not sure if it's that link though, but that's one of them at least, i also noticed there that there's ldap client debugging, including ssl, so gotta try that, if it's even applicable here.

But yeah as it's not clear (at least on my part) at all how it's actually supposed to be handled cause of all the different results, it's probably not that bad to just pass anything on unbind. Don't really like it, as it's just hiding an issue basically, but it doesn't seem to have any practical bad impact i think, and it doesn't look like it a real solution is in sight sadly.

I will try to dig deeper and if i do happen to find something that looks promising i will report it cause this really bothers me, damn microsoft;P

Really appreciate your help in all this, many thanks:)!!