graylog-labs / graylog2-web-interface

[DEPRECATED]
https://www.graylog.org/
611 stars 174 forks source link

Active directory auth fails on first attempt, succeeds on second #1707

Open jontaa opened 8 years ago

jontaa commented 8 years ago

We noticed that after enabling AD authentication on our Graylog 1.2.2 the first attempt was always unsuccessful, while the second was always successful.

So I did a packet dump during the logins to try and narrow it down. This is what happens:

Packet dump started User try to login 1st time Login fail Wait for 20 seconds Packet dump stopped

New packet dump started User try to login 2nd time Login successfull Wait 20 seconds Packet dump stopped

When checking the packet dump I can see that after the 1st login, no attempt to communicate with LDAP servers were made. When 2nd login attempt was done, I could see LDAP communication. I can successful replicate this issue every time by logging out user and attempt to login again.

What debugs and logs could be useful here? Is this a known issue?

kroepke commented 8 years ago

We have seen reports of this but were unable to reproduce the issue so far. Does the failed attempt take 2 seconds to fail or does it fail immediately?

jontaa commented 8 years ago

I tested now. I counted to about 3 seconds to fail. Another observation is that it doesn't matter what you put as password the first time, as long as you put in the correct password the second time.

kroepke commented 8 years ago

Ok, this sounds like a problem with the connection pool. I'll take a look if I see anything obvious.

jontaa commented 8 years ago

Thanks. Let me know if you want me to run any tests on my system, since I can replicate the issue every time.

kroepke commented 8 years ago

Can you please check what the output in the server log is when you set the log level of org.apache.directory.ldap.client.api.LdapNetworkConnection to DEBUG?

For how to do that please see: http://docs.graylog.org/en/1.2/pages/installation/manual_setup.html?highlight=external#supplying-external-logging-configuration or you could use the REST API browser.

jontaa commented 8 years ago

Ok so I edited /opt/graylog/conf/log4j.xml and appended

 <logger name="rg.apache.directory.ldap.client.api.LdapNetworkConnection">
        <level value="debug"/>
    </logger>

Then I did a graylog-ctl restart I then replicated the issue a few times. graylog-ldap.txt

kroepke commented 8 years ago

Thanks!

This looks like the connection simply times out the first time, I wonder what it does all that time. Can you play with the LDAP connection timeout setting in graylog.conf:

ldap_connection_timeout = 5000 # default is 2000 milliseconds

Increasing it should make the problem go away, apparently the connection to the server takes quite some time. If that's successfull we'll increase the default timeout which should solve the issue for most people.

jontaa commented 8 years ago

No difference. Now it takes 5 seconds for it to fail the first time. Remember that I dumped the traffic on the first attempt, and it didnt even try to connect to any LDAP server until the second attempt.

kroepke commented 8 years ago

when the first failure to connect happens, is there any line in the server log that say Couldn't connect to LDAP directory?

kroepke commented 8 years ago

Do you have the same problem when testing the LDAP configuration? With both "Test connection" and "Test login"? Those parts use the exact same code as the login itself.

jontaa commented 8 years ago

So here is a text file which is only the log for a failed attempt graylog-ldap.txt

Testing "test connection" and "test login" works fine when logged in.

kroepke commented 8 years ago

I simply cannot reproduce this locally. For me the first request simply succeeds and I cannot even see any way how a second connection is any different from the first.

Is there anything special about your setup? High latency to AD, loadbalancer in front etc?

jontaa commented 8 years ago

Yeah this is super weird So lets see how the work flow is I go to graylog and I get this login box image After a few seconds I get this box image Which after a few more seconds changes to this box image

In the log we can see 2015-12-09_11:09:26.75305 ERROR [LdapConnector] Timed out connecting to LDAP server 2015-12-09_11:09:26.75656 ERROR [LdapUserAuthenticator] LDAP error 2015-12-09_11:09:26.78837 WARN [SessionsResource] Unable to log in user jobn 2015-12-09_11:09:26.81711 DEBUG [LdapNetworkConnection] ------>> Cannot get the connection... Retrying 2015-12-09_11:09:26.84720 DEBUG [LdapNetworkConnection] Interrupted while waiting for connection to establish with server XXXX.local:389

However if we check the packet capture we took on the graylog host image So we see traffic to and from the graylog server. Lets now filter on all traffic which has TCP port 389 image Nothing

If I compare with a packet capture during second and successful login attempt take in the same way image

Ok so we know that if no connection is made, could it be DNS?

So in our unsuccessfull attempt we see that graylog is resolving DNS in the exact same way as the successful.

pandaadb commented 7 years ago

Hi,

I am not sure if this is relevant at all, but I just ran into a similar issue. I thought I would post my findings as this was really odd to me as well.

What happened was that I configured my LDAP config (userfilter and groupfilter) with something like:

ldap.groupFilter= cn=groups,cn=accounts,dc=test,dc=com

Other properties accordingly - not important.

What happened was that each first attempt returned unauthenticated, while each subsquent attempt does authenticate.

In the backgroup, my configuration file used apache PropertySubstitutor. This particular class uses a Property resolver that reads , signs as separators, hence the group filter went from being a string, to being an array of strings (each , separator is 1 entry). So the configuration just didn't work, which is why the initial authentication failed. (I do not know why every other authentication worked .. it was kind of self healing). I only realised this after doing the config manually in a little test program and realising that it did in fact work.

I thought I'd post this, as this could be why one of you can't reproduce it, while it behaves this strangely for others.

Of course I don't know if that's still relevant or if I'm even making sense in this case :) google lead me here, i thought maybe this could help, if not please ignore me :)

Regards, Artur

lliknart commented 7 years ago

Hi,

I have the same issue. Users have to try 2 attempts to log in. Same result as @jontaa

MichaelW-SD commented 7 years ago

Same here. I just updated to 2.1.1 but unfortunately it still happens.

Pushpendra-blr commented 7 years ago

Hi Guys, This can be caused by any recent Microsoft update which might have tickled .net frame work settings, in our case we went through series of log collection nothing helped us, but cloning the PC and upgrading . Net frame work to version 4 helped us to fix the problem.