inverse-inc / packetfence

PacketFence is a fully supported, trusted, Free and Open Source network access control (NAC) solution. Boasting an impressive feature set including a captive-portal for registration and remediation, centralized wired and wireless management, powerful BYOD management options, 802.1X support, layer-2 isolation of problematic devices; PacketFence can be used to effectively secure networks small to very large heterogeneous networks.
https://packetfence.org
GNU General Public License v2.0
1.39k stars 291 forks source link

Authentication Failed Active Directory #8370

Closed RenatoPereira91 closed 2 weeks ago

RenatoPereira91 commented 3 weeks ago

Describe the bug I've configure the Active Directory integration with packetfence 13.2 following the documentation, is working but I'm get this message below, when I get this message sometime the user can't connect getting a lot of reject logs.

(53450) Wed Oct 30 12:09:38 2024: Debug: chrooted_mschap: --> --nt-response=597e6b85a491569f653840dc3a65d4a58e52b9830566ee97 (53450) Wed Oct 30 12:09:38 2024: ERROR: chrooted_mschap: Program returned code (1) and output 'NT Error: code: 3221225506, message: (3221225506, '{Access Denied} A process has requested access to an object but has not been granted those access rights.')' (53450) Wed Oct 30 12:09:38 2024: Debug: chrooted_mschap: External script failed (53450) Wed Oct 30 12:09:38 2024: ERROR: chrooted_mschap: External script says: NT Error: code: 3221225506, message: (3221225506, '{Access Denied} A process has requested access to an object but has not been granted those access rights.') (53450) Wed Oct 30 12:09:38 2024: ERROR: chrooted_mschap: MS-CHAP2-Response is incorrect

I disabled any cache (NT and NTLM), I tried delete use and machine but didn't work.

To Reproduce Steps to reproduce the behavior:

  1. Configure the Active Directory
  2. Configure the Realm
  3. Configure the Authentication Sources

Screenshots image

stgmsa commented 3 weeks ago

Hi, @RenatoPereira91 , this is a known issue, and it’ll be fixed soon. Thank you for the feedback

RenatoPereira91 commented 3 weeks ago

Hi, @stgmsa thank you for answer me. Is there any workaround to do? Because sometime the user can't connect for long time like 10min.

E-ThanG commented 3 weeks ago

In the meantime, if you're able to rebuild the ntlm-auth-api docker container, you can comment out this chunk from bin/pyntlm_auth/rpc.py. Make sure that the old docker image is completely gone before you start the service back up.

domain_controller_records = utils.find_ldap_servers(global_vars.c_realm, global_vars.c_dns_servers)
if len(domain_controller_records) > 0:
    idx = random.randint(0, len(domain_controller_records) - 1)
    record = domain_controller_records[idx]
    server_name = record.get('target')
RenatoPereira91 commented 3 weeks ago

Hello @E-ThanG I don't have a lot of experience with rebuild docker. Could you write the steps to achieve the goal?

I checked the file bin/pyntlm_auth/rpc.py in packetfence 13.2 and dont have the lines below:

domain_controller_records = utils.find_ldap_servers(global_vars.c_realm, global_vars.c_dns_servers)
if len(domain_controller_records) > 0:
    idx = random.randint(0, len(domain_controller_records) - 1)
    record = domain_controller_records[idx]
    server_name = record.get('target')

I found the comparison between the files:

https://fossies.org/diffs/packetfence/13.2.0_vs_14.0.0/bin/pyntlm_auth/rpc.py-diff.html

I will try in a lab environment before apply in production environment.

E-ThanG commented 3 weeks ago

Woops, I missed that you said 13.2. Sorry, My change doesn't work for 13.2. I didn't notice issues with that version, but I'm still building out our server, so perhaps more testing at scale would have given me issue.

They have a few other changes in the upcoming code. Perhaps those will help in your instance.

You don't want the code I pasted, that is what I removed to get my 14.0 server working.

If you want to try code changes you can rebuild the docker containers via this process:

  1. Make sure you have a snapshot or backup to revert to in case things go south, and it probably will.
  2. Be OK with that :D
  3. Make your changes to the code.
  4. Stop the ntlm-auth-api service.
  5. list all docker images with docker images | grep ntlm-auth-api
  6. Use docker docker rmi -f IMAGE-ID to delete both docker images for the ntlm-auth-api using the IMAGE-IDs from above
  7. use /usr/local/pf/addons/dev-helpers/build-local-container.sh ntlm-auth-api to build a new image.
  8. Start ntlm-auth-api back up and try your changes.
stgmsa commented 2 weeks ago

Hi @RenatoPereira91

Sorry, I think I mixed this with #8345. After a carefully review, I believe this issue you reported on 13.2 should have nothing to do with 8345.

The NT error code 3221225506 is typically due to a wrong machine account password.

Here is the steps might be useful for troubleshooting:

  1. make sure you joined the machine account successfully. Pay special attention to the realm, netbios name, workstation, domain name, fqdn, etc that are related to Windows Domain, make sure they are correct.
  2. do a machine account password test in the GUI: fill in the clear-text password in the "machine account password" field and click "test" button, if the ntlm-auth-api service is started and returned with a "machine account test OK", the machine account is correctly configured.
  3. if you are running PF v13.2 in cluster mode, or used "%h" as the hostname, you might "changed" the machine account password already, make sure step 2 returns with a "OK" and try to re-join the PF instance by deleting the domain config, refresh the Admin UI and add the config again.

LMK.

RenatoPereira91 commented 2 weeks ago

Hello @E-ThanG, I'll try today rebuild the docker.

@stgmsa the problem is strange, because like the example below, the first authentication failed but the second in the same server work:

image

But there are times when the device try several times until it is able to accept.

I did the machine password test, in the three server and worked.

I didn't use the %h, I used the same name I have in hostname (System Configuration --> Main Configuration --> General Configuration)

stgmsa commented 2 weeks ago

If you were not using %h or has %h in your machine account, you're probably using the same machine account across all the 3 servers. so when a secure channel is established on a specific node, another schannel will "kick" the old secure channel which cause the previous bind to expire and fail.

I would suggest you use "%h" as the machine account, and joins each PF node in the domain controller. This will create 3 different machine account so they use their own machine account to establish the schannel.

e.g. assuming that you have a cluster of 3, you'll need to do the following steps(method1):

  1. create the domain profile using the admin panel, make sure %h is the machine account.
  2. switch the API server to node2 and do the samething on node2, but you'll have to re-type the clear-text machine account password (same password) when click "save", this will allow you create another machine account using identical password but a different machine account name.
  3. repeat step2 on node3.
  4. restart ntlm-auth-api service on all the 3 nodes
  5. make sure all the 3 nodes passes the machine account test then you should be good. so after these steps, you'll probably see 3 machine accounts like pfnode01, pfnode02, pfnode03 - pfnode01/02/03 is the hostname or part of the hostnames of the PF node.

or if you don't want to switch back and forth on API, you can also manually add the machine account in windows AD. but you'll still have to use %h as the machine account name (method 2) after node 1 is joined to the AD, check on the AD and find the machine account that uses the node1's hostname. (or maybe part of the hostname if your PF node's hostname is a FQDN format) then manually create another two machine accounts on windows AD, follows the same naming rule, and set the idential password for each of the 2 accounts. restart the ntlm-auth-api service and test the machine accounts.

stgmsa commented 2 weeks ago

@RenatoPereira91 ☝️ forgot to tag you in the previous comment.

RenatoPereira91 commented 2 weeks ago

@stgmsa I did and work. Thank you.

Just to note the steps:

1 - Remove the Active Directory Domains 2 - Recreate the Active Directory with same ID 3 - Reset the services