sipcapture / homer-app

HOMER 7.x Front-End and API Server
http://sipcapture.io
GNU Affero General Public License v3.0
207 stars 85 forks source link

LDAP stuck after one RST packet #324

Closed kpribic closed 4 years ago

kpribic commented 4 years ago

Currently LDAP code establishes a TCP connection to LDAP server, executes a request and then keeps the TCP connection active via 15sec TCP keepalives. After 15mins of keepalives my LDAP server cuts the connection with TCP RST packet.

image

After that homer-app is confused because it apparently doesn't know how to setup the connection again. All new auth queries don't initiate a new TCP connection but simply fail in the app

# tail -f /usr/local/homer/log/homer-app.log
{"level":"error","msg":"Couldn't auth user: LDAP Result Code 200 \"Network Error\": ldap: connection closed","time":"2020-03-16T11:59:59Z"}
{"level":"error","msg":"Couldn't auth user: LDAP Result Code 200 \"Network Error\": ldap: connection closed","time":"2020-03-16T12:00:00Z"}
{"level":"error","msg":"Couldn't auth user: LDAP Result Code 200 \"Network Error\": ldap: connection closed","time":"2020-03-16T12:00:01Z"}
{"level":"error","msg":"Couldn't auth user: LDAP Result Code 200 \"Network Error\": ldap: connection closed","time":"2020-03-16T12:00:01Z"}
{"level":"error","msg":"Couldn't auth user: LDAP Result Code 200 \"Network Error\": ldap: connection closed","time":"2020-03-16T12:00:01Z"}

Restart of homer-app service re-establishes the connection, but after 15mins of idling it's stuck again.

My understanding is that LDAP clients should keep the connection active only for a limited time, ballpark of 5 minutes, and then disconnect. When new request comes new connection should be established.

NOTE: Version: homer-app 1.1.35

adubovikov commented 4 years ago

https://github.com/sipcapture/homer-app/commit/5b1dd122ee1edfb1d71ce64f41847da164d7b88c please compile and test

kpribic commented 4 years ago

As far as I understand defer will delay closing connection until function call is over. It kind of works in sense that I can login and connections don't hang, however when I inspect the trace there is one error. This is what happens, described from prespective of homer-app:

  1. Establish TCP connection
  2. Send LDAP bind request for binddn (success)
  3. Send LDAP serch request for user ( get response)
  4. Send LDAB bind request for user (success)
  5. Send LDAP bind request for bindn (success)
  6. Send TCP FIN.

This exchange lasts for ~100ms Immediately afterwards:

  1. Establish new TCP connection
  2. Send LDAP search request for user (failed)
  3. Get error: 000004DC: LdapErr: DSID-0C0907C2, comment: In order to perform this operation a successful bind must be completed on the connection., data 0, v2580
  4. Teardown TCP connection.

It appears that the second request failed because bind to binddn was done in wrong TCP connection (previous and now closed). I don't know why there were two requests for user, I logged in only once and I didn't experience any issues logging in. Tried this multiple times, flow is always same.

Marked with yellow is the new TCP connection. Not sure why we stil get RST packet after FIN, but I think it probably isn't important for this issue

image

kpribic commented 4 years ago

One correction, after I reviewed the traces, step 8 above is actually searchRequest for memberUid=%username%

Where %username% is value previously used for sAMAccountName, and no the full LDAP path to that user.

kpribic commented 4 years ago

@adubovikov is there any particular reason why memberUid=%username% is used? I can't get that to work on Active Directory.

What did work (in my case specifically for recursive search) is:

1) Defining an admingroup, adding "dn" to ldap configconfig and setting adminmode to false.

  "ldap_config": {
    "admingroup": "HOMER_admins",
    "adminmode": false,
    "anonymous": false,
    "attributes": [
       "dn",
      "givenName",
      "sn",
      "mail",
      "uid"
    ],

2) Setting group filter to:

    "groupfilter": "(member:1.2.840.113556.1.4.1941:=%s)",`

3) Making these code changes to fix dropped connection and LDAP query with username instead of DN: https://github.com/kpribic/homer-app/commit/3570f2ef91c59488657ffc8ea233a0d4f048f0f9

This results in behavior:

kpribic commented 4 years ago

Fixed along the way while fixing #328