jupyterhub / ldapauthenticator

LDAP Authenticator Plugin for Jupyter
BSD 3-Clause "New" or "Revised" License
206 stars 178 forks source link

Coordination about escaping issues and fixes #243

Closed consideRatio closed 2 months ago

consideRatio commented 2 months ago

Hey, I'm trying to co-ordinate help to make progress to an escaping bug without actual knowledge of LDAP or this authenticator, so I'm pinging a few people involved:

Goal

What I can do

I can review and merge as a jupyterhub org maintainer, but I need assicance because I'm not familiar with LDAP tech. A key concern for me when reviewing is to not disrupt existing users without issues while fixing something for other users with issues, to do that I want to ensure we have a clear communication about any breaking changes if they are needed.

What can you do?

consideRatio commented 2 months ago
Nikolai-Hlubek commented 2 months ago

Thank you Erik for pushing this forward.

From my organization I can only say we looked at PR #225 and it didn't work for us (@m-erhardt can explain the details better (if I remember correctly PR #225 fixes the user field by escaping a parenthesis in the username but more LDAP fields could be involved an need to be escaped)), hence there is PR #238 which we have running productively since July.

I think both PR #225 and PR #238 present no breaking changes but bugfixes. From the code logic side I don't know about LDAP so I could only help with python or writing a unit tests in general, but not the LDAP specifics that were changed.

m-erhardt commented 2 months ago

Hi everyone & thanks to @consideRatio for picking this up.

As my colleague @Nikolai-Hlubek already mentioned we initially looked into #225.

After diving deeper into this rabbit hole (LDAP RFC, tests with Active Directory vs OpenLDAP) I came to the conclusion that there is a misunderstanding (or probably lack of documentation) with regards to what escape_userdn is supposed to do. In my opinion #226 (although it works in certain conditions) is also subject to this misunderstanding.

From the current documentation:

If set to True, escape special chars in userdn when authenticating in LDAP. On some LDAP servers, when userdn contains chars like '(', ')', '' authentication may fail when those chars are not escaped.

In my opinion escape_userdn should only affect how we treat the LDAP Bind-DN (and could probably be renamed to escape_binddn instead to avoid confusion - but this would be a breaking change).

So my proposed change in #238 does not tamper with escape_userdn at all but makes sure that all user- or ldap-provided input strings that are used in ldap search strings are escaped according to RFC4515.

escape_filter_chars() only escapes characters that are part of the LDAP filter syntax (like ()|&) and thus result in invalid/non-RFC-compliant filter strings when not escaped.

So in principle (I am not 100% sure though - who knows what LDAP implementations are out there) #238 should not be a breaking change as unaffected users (which do not have these LDAP filter characters in their LDAP attributes) should not experience any change in behaviour.

All affected users previously ran into ldap3.core.exceptions.LDAPInvalidFilterError.

consideRatio commented 2 months ago

Thank you @Nikolai-Hlubek and @m-erhardt!

I'm now quite confident #238 was a bugfix without breaking changes that should be done at all time, and not conditionally.

I'm not sure if that fixed #225 as well, but I think maybe it did. @hammadab could you check if what is now in the main branch resolves your issue in #225?

consideRatio commented 2 months ago

I think I've spotted a remaining bug related to escape_userdn=True. I'd love your help to think about this @Nikolai-Hlubek @m-erhardt @hammadab!

EDIT: I've attempted to fix this bug, if it is a bug, with https://github.com/jupyterhub/ldapauthenticator/pull/253.


In the resolve_username function, there is no bug. It looks like this, and the search_dn isn't used for anything else:

https://github.com/jupyterhub/ldapauthenticator/blob/cf6d3ae7fd0d4c9e0a608cc911e7fedfb778fc9e/ldapauthenticator/ldapauthenticator.py#L246-L252

escape_userdn is applied in a very similar way on what gets passed to get_connection here as well in authenticate function:

https://github.com/jupyterhub/ldapauthenticator/blob/cf6d3ae7fd0d4c9e0a608cc911e7fedfb778fc9e/ldapauthenticator/ldapauthenticator.py#L370-L375

But, the bug is that the variable was escaped is re-used in authenticate later on, making it double-escaped.

https://github.com/jupyterhub/ldapauthenticator/blob/cf6d3ae7fd0d4c9e0a608cc911e7fedfb778fc9e/ldapauthenticator/ldapauthenticator.py#L419-L430

consideRatio commented 2 months ago

Besides this, I'm asking myself if:

  1. Shouldn't escape_userdn always be set to true if it fixes issues in some situations when establishing a connection?
  2. Should the DN we pass to get_connection be escaped with escape_filter_vars? I found a note in the docs for ldap3.Connection about ldap3.utils.dn.escape_rdn saying: image
consideRatio commented 2 months ago

I've opened #267 to hopefully conclude the escaping business - help reasoning that it could make sense to merge or not would be great if anyone has the capacity to help with some review.