Closed Zerowalker closed 3 years ago
The error looks like what you'd get when attempting an operation on an already closed connection.
Can you provide a program that can reproduce that behavior against the test service in the distribution (configured in the data
directory)?
hmm, perhaps it's a windows thing, as i am doing this against a windows dc. will try setup the ldap test server though to confirm.
The client is also on Windows then, I suppose? In any case, post a minimal program (with anonymized DNs and bind parameters, of course) which demonstrates the problematic behavior.
yes, i couldn't get the ldap server test to work, tried on both debian and ubuntu so sadly i can't provide anything there. But here's an example code which fails upon unbind (as long as the provided parameters are correct of course) https://github.com/Zerowalker/ldap3test
Briefly, I can't reproduce with a Linux client connecting to AD (Windows Server 2016, I think.) I get the output
binding
LdapResult { rc: 0, matched: "", text: "", refs: [], ctrls: [] }
ldap ended normally
I don't have a Windows machine with Rust handy, and probably won't in the next couple of days, so I can't test with a Windows client.
A couple of notes. First, for StartTLS, the URL should be ldap://
and not ldaps://
since the initial connection is in the clear. This crate doesn't make the distinction, which some may regard as a bug.
Second, your Cargo.toml
, at least on Linux, doesn't force the use of rustls
, since the default option is tls-native
, which uses OpenSSL. To actually use rustls
, the following suffices in the [dependencies]
section:
ldap3 = { version = "0.9.2", default-features = false, features = ["tls-rustls"] }
tokio = { version = "1.1.1", features = ["full"] }
You don't need rustls
and async-trait
as dependencies in that case.
well changing the Cargo.toml
seems to solve it:S
this works, but using rusttls separately or not adding it at all causes the ldap error.
ldap3 = { version = "0.9.2", default-features = false, features = ["tls-rustls"] } tokio = { version = "1.1.1", features = ["full"] }
confuses me a bit, shouldn't it either behave the same or fail to compile or when you try to make the connection otherwise?
And yeah forgot to remove async-trait
, threw it together quickly so it was messy, not that my code isn't to begin with haha;P
Strange, but at least you have a workaround. I can't say more until I test it myself, let the issue remain open in the meantime.
Can't reproduce with a Windows 7 client in any TLS configuration against an OpenLDAP server. I'll close this for now; if you have an updated reliable reproduction case we can reopen it.
How do you setup the OpenLDAP server, would like to try it as well, to see if it behaves differently compared to a windows domain controller. Cause i can't make out why i have these problems as you clearly don't, and this seems to be the only thing that's clearly different, at least that i can think of.
This is for Ubuntu 16.04, other versions should be similar.
Install slapd
. Go to /etc/ldap
and run
openssl req -x509 -nodes -newkey rsa:2048 -keyout key.pem -out cert.pem -subj '/CN=localhost' -days 365
This will generate cert.pem
and key.pem
in that directory. Next, create /etc/ldap/slapd.conf
with the following contents:
include /etc/ldap/schema/core.schema
argsfile /var/run/ldap/slapd.args
pidfile /var/run/ldap/slapd.pid
moduleload back_mdb.so
TLSCACertificateFile /etc/ldap/cert.pem
TLSCertificateFile /etc/ldap/cert.pem
TLSCertificateKeyFile /etc/ldap/key.pem
TLSProtocolMin 3.3
database mdb
suffix "o=local"
maxsize 512000000
directory /var/lib/ldap
rootdn "cn=manager,o=local"
rootpw secret
index objectClass eq
In /etc/default/slapd
, place the following two keys (any other definitions of these two keys should be commented out):
SLAPD_CONF=/etc/ldap/slapd.conf
SLAPD_SERVICES="ldapi:/// ldap:/// ldaps:///"
Start the server with systemctl start slapd
. I had to run aa-complain /usr/sbin/slapd
to pacify AppArmor beforehand, not sure why.
Bind with DN cn=manager,o=local
and password secret
.
I managed to arrange the test with Win7 client -> Server 2016 + AD. Couldn't reproduce the problem. Did you try to do a cargo clean
and rebuild the problematic program? I believe that a corrupted build would manifest with less subtle errors, but it doesn't hurt to try.
haven't gotten to try the openldap yet, thanks for the instruction btw! What was the cargo.toml like in your windows test? just to make sure i set it up the same, i will also try cargo clean.
Really grateful for your interest in trying to find the issue, i know how frustrating it can be when it's not reproducable on your end. It seems to really point towards me doing something wrong or having something corrupted as you say.
What was the cargo.toml like in your windows test?
Identical to your test program, without trimming the unnecessary dependencies:
rustls = "0.19.0"
ldap3 = "0.9.2"
tokio = { version = "1.1.1", features = ["full"] }
async-trait = "0.1.42"
As noted before, this actually uses native-tls
and friends as the TLS backend.
It works for me as well. But specifying rusttls spearately or in the "features" is the same for ldap3?
So when not specifying rustls at all, is it supposed to work with ldaps/starttls?
EDIT:
False report, it worked once then it didn't again. No clue why this happens, but i assume you can do the test multiple times in a row and it always succeeds?
First of all...
i assume you can do the test multiple times in a row and it always succeeds?
Yes.
Do you happen to have some kind of network translation box between your machine and the server, or an application firewall installed on the machine? Can you disable them? Can you run the program on the server itself?
But specifying rusttls spearately or in the "features" is the same for ldap3?
It isn't. If you have rustls
in the dependencies without configuring it in the features, ldap3
won't use it as the TLS backend, because it uses native-tls
as the default.
It's on a virtual machine, so there can be some weird stuff, i will try running it directly on the domain controller to make sure this doesn't mess anything up.
It isn't. If you have rustls in the dependencies without configuring it in the features, ldap3 won't use it as the TLS backend, because it uses native-tls as the default.
So if i have rusttls
in the cargo or not doesn't really do anything.
but specifying features = ["tls-rustls"]
(wether or not rusttls is in the cargo) makes ldap3 use it, do i get that right?
cause in that case, wouldn't it mean that native-tls
vs rusttls
is the cause to some degree at least?
as a cargo like this:
[dependencies]
ldap3 = { version = "0.9.2", default-features = false, features = ["tls-rustls"] }
tokio = { version = "1.2.0", features = ["full"] }
Seems to work fine.
But this doesn't:
[dependencies]
ldap3 = { version = "0.9.2" }
tokio = { version = "1.2.0", features = ["full"] }
I updated my test code to simplify things.
https://github.com/Zerowalker/ldap3test
If i run that, all tests succeeds.
However if i run the native-tls branch: https://github.com/Zerowalker/ldap3test/tree/native-tls
(which is just ldap3 without specifying tls-rustls
) it fails at the first encrypted unbind:
Testing ConnType: Ldap
-----Test [Ldap]: #0-----
url: ldap://192.168.1.5, StartTls: false
-----Test Done-----
-----Test [Ldap]: #1-----
url: ldap://192.168.1.5, StartTls: false
-----Test Done-----
-----Test [Ldap]: #2-----
url: ldap://192.168.1.5, StartTls: false
-----Test Done-----
-----Test [Ldap]: #3-----
url: ldap://192.168.1.5, StartTls: false
-----Test Done-----
-----Test [Ldap]: #4-----
url: ldap://192.168.1.5, StartTls: false
-----Test Done-----
-----Test [StartTls]: #0-----
url: ldap://192.168.1.5, StartTls: true
unbind failed
Will run it on the domain controller directly and report back
EDIT:
Seems to act the same.
Testing ConnType: Ldap
-----Test [Ldap]: #0-----
url: ldap://127.0.0.1, StartTls: false
-----Test Done-----
-----Test [Ldap]: #1-----
url: ldap://127.0.0.1, StartTls: false
-----Test Done-----
-----Test [Ldap]: #2-----
url: ldap://127.0.0.1, StartTls: false
-----Test Done-----
-----Test [Ldap]: #3-----
url: ldap://127.0.0.1, StartTls: false
-----Test Done-----
-----Test [Ldap]: #4-----
url: ldap://127.0.0.1, StartTls: false
-----Test Done-----
-----Test [StartTls]: #0-----
url: ldap://127.0.0.1, StartTls: true
unbind failed
So it seems that you have problems with the native-tls
build. What happens if you skip the ConnType::StartTls
tests—do the ConnType::Ldaps
tests work?
Indeed, and read about the difference yesterday, something about one being platform agnostic and the other being the opposite. Very confusing in my view as i just expect it to work the same as long as you set the same type and protocol etc (as in TLS version 3 or whatever, not sure how the encryption stuff works;P).
I will check ldaps, i will run all of them and not just abort upon failure and post the logs, will update the git to keep it consistent.
Here is the native tls results:
[Ldap]: #0 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #1 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #2 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #3 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #4 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #5 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #6 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #7 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #8 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[Ldap]: #9 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: false]
[StartTls]: #0 - [ FAILED ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #1 - [ FAILED ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #2 - [ FAILED ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #3 - [ FAILED ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #4 - [SUCCEEDED] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #5 - [ FAILED ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #6 - [ FAILED ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #7 - [ FAILED ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #8 - [ FAILED ] - [url: ldap://192.168.1.5, StartTls: true]
[StartTls]: #9 - [ FAILED ] - [url: ldap://192.168.1.5, StartTls: true]
[Ldaps]: #0 - [SUCCEEDED] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #1 - [ FAILED ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #2 - [SUCCEEDED] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #3 - [ FAILED ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #4 - [ FAILED ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #5 - [ FAILED ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #6 - [ FAILED ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #7 - [SUCCEEDED] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #8 - [ FAILED ] - [url: ldaps://192.168.1.5, StartTls: false]
[Ldaps]: #9 - [ FAILED ] - [url: ldaps://192.168.1.5, StartTls: false]
It's random on both StartTLS and LDAPS if if suceeds or fails. my assumption is that it doesn't close the connection correctly or something, so it can succeed, but then the next time it will fail until the server closes it, and then you succeed again. But it doesn't seem to be deterministic.
Having a network packet and/or system call trace would probably throw some light on this behavior, but decrypting TLS in packet captures with Wireshark is not easy, and I'm not very well versed in Windows syscall tracing. Have you tried your client against a Linux-based OpenLDAP server?
Been checking it with wireshark and asking around to make sense of it. They do behave a little different during the handshake part, then they are the same except the one that fails never get the "encrypted alert" package from the server.
Sadly no, i got caught up and never set it up, will try it as you posted some fine instructions before.
I tried against OpenLDAP on Debian and using native-tls ldap3 on Windows and it seems to succeed in all tests (LDAP,STARTTLS,LDAPS).
Since it works with Debian, my best guess is that there's something in the AD server/service setup that triggers the TLS connection instability when using the native-tls
(SChannel) backend on the client, and that Rustls somehow avoids it. It may be instructive to capture the Rustls connection establishment and compare it with the native-tls
one, if possible.
I will gladly capture it, just need to know how to do it. I know how to use wireshark, but unsure if that's sufficient for this or not?
Try to capture complete sessions, both successful and unsuccessful, with native-tls
and Rustls, in separate .pcap
files, and name them accordingly. I'm not sure how helpful it's all going to be without TLS decryption, but you can try. Encrypted captures will at least hide sensitive local details such as bind passwords.
Here is 3 tries on LDAPS on both rusttls and native-tls. rusttls succeeded all, native-tls failed all. rust_nativetls_3_sessions.zip
In the traces, neither native-tls
nor Rustls have the connection closed regularly. In both cases, the server's last packet is an RST, indicating a forced/errored closing on the server side. The difference is that Rustls invariably sends an "Encrypted Alert" as the last packet, possibly informing the server about connection shutdown. The LDAP spec says that both sides should "gracefully terminate" the session when the server receives an Unbind, so the RST is not kosher.
Since the Unbind operation is unacknowledged per the protocol spec, the client doesn't expect a reply, but returns the indication of successful socket write operation. It's possible that the server's unexpected RST flags the connection as errored despite the successfully transmitted Unbind, and that error gets propagated back.
Since the Unbind indicates that you're finished with the connection, you could simply ignore its error status and drop the connection once you're finished with it, if the problems of this type persist. I'm not sure whether I could special-case Unbind errors in the IO loop without burdening the implementation with very finicky code.
hmm, so the unbind is the 90bytes packet? and then rusttls uses "Encrypted Alert" to tell it to close.
So the Unbind is successful as long as the TCP tells it that the data has been given to the other end? How does is it handled in other cases, is it expecting a certain response?
Kinda interested in how other solutions handle it, i have used it in golang i think, will see if i can make something out of it.
What do you mean "if the problems of this type persist", aren't we expecting it to be this way if both parts are doing their thing, but they are expecting different responses when closing down?
EDIT:
Here is a wireshark capture of me trying the go-ldap. it seems like it perhaps closes a bit more gracefully as it uses ACK, FIN? But the Close function in doesn't return any errors so not sure if it will just panic if it assumes it's wrong, or if it just ignores everything, will try digging the code a bit more to see if i can make sense of that part.
hmm, so the unbind is the 90bytes packet?
Well that's the problem, the packet is encrypted and I can only presume that it's the Unbind.
So the Unbind is successful as long as the TCP tells it that the data has been given to the other end? How does is it handled in other cases, is it expecting a certain response?
The specification (RFC 4511) says that Unbind has no response. Other operations mostly do have one.
What do you mean "if the problems of this type persist"
If the only problematic operation in your application is Unbind, you can skip it if you drop the Ldap
handle. The connection is not gracefully closed if you do that, but it's closed and you shouldn't see any errors.
I tried testing at work, and i think it actually always succeeded, but i can't replicate it in my test suite even though i have set it up from scratch. I might have done something wrong in the test though so i will have to redo it to really make sure, cause it doesn't make sense to me.
In the other hand i captured a connection/bind/unbind from "ldp" which seems to be an inbuilt test tool for ldap for windows domains, just to see if that gives any extra insight on anything.
The test was done by connecting to the server with 636 and SSL checked.
Then a simple bind with "user@domain" and it's password, then a disconnect (which seems to call unbind) base on the logs:
0x0 = ldap_unbind(ld);
Here is the capture for what it's worth: ldp_ssl_636_connect_bind_unbind.zip
It looks quite similar i think? so it might just be that unbind is successful if it sends and the server just closes the connection without any response like the specifications state? How does OpenLDAP handle it then, does it respond and ldap3 expects this response?
OpenLDAP sends a FIN,ACK which is a graceful connection close.
I made a tweak to the closing sequence to attempt a true graceful close on the client side; maybe it helps. It is in the shutdown
branch of the GH repo, so you can use it in your code by specifying:
ldap3 = { git = "https://github.com/inejge/ldap3", branch = "shutdown" }
Try it with your failing program and see how it goes.
Sadly it didn't seem to help, still fails on unbind and the server doesn't send a FIN,ACK. But the client does which is your change i presume, but it happens after the server has sent RST, ACK though.
Bummer. For laughs, I changed Unbind to ignore errors and plough through with shutting down and closing the stream. It's in the shutdown
branch again. Do a cargo update --package ldap3
(you should get commit 0b3a8886), and recompile. Does it work now?
It succeeds as expected, sorry for late answer.
Well that was easier than I suspected, and I think I'm going to keep it: anything after the successful write-out of an Unbind is just politeness on the client part, and it doesn't really matter if it fails. I'll update this message when I merge the change in master
.
Well just to make things messier, i did some extra checks at work. And there it works on native-tls, but not on rustls, or well one dc failed and the other succeeded. i didn't check wireshark, but it really odd.. i have tried researching and only thing i can find is that windows handles the certification handshaking oddly with ldap or something. And it also doesn't use the subject name which seems to cause issues, and it used them before (not sure which version). https://docs.microsoft.com/en-us/troubleshoot/windows-server/identity/ldap-over-ssl-connection-issues not sure if it's that link though, but that's one of them at least, i also noticed there that there's ldap client debugging, including ssl, so gotta try that, if it's even applicable here.
But yeah as it's not clear (at least on my part) at all how it's actually supposed to be handled cause of all the different results, it's probably not that bad to just pass anything on unbind. Don't really like it, as it's just hiding an issue basically, but it doesn't seem to have any practical bad impact i think, and it doesn't look like it a real solution is in sight sadly.
I will try to dig deeper and if i do happen to find something that looks promising i will report it cause this really bothers me, damn microsoft;P
Really appreciate your help in all this, many thanks:)!!
I don't know if it's a bug or i am doing something wrong, but whenever i try to use ldaps or starttls it fails when i unbind.
I seem to bind and even search just fine, but unbind will throw an error:
ResultRecv { source: RecvError(()) }
It seems to somtimes suceed once or twice for some reason but it's rare.Is someone able to test this?