theforeman / smart_proxy_realm_ad_plugin

foreman-proxy realm plugin for Active Directory
GNU General Public License v3.0
9 stars 10 forks source link

RFC: A proper workaround for domains with multiple DCs #20

Open ananace opened 5 years ago

ananace commented 5 years ago

We've started running into issues with domains where we have multiple main DCs at all times, where joins tend to fail with a nonsensical authentication error - the object is created fine but the password fails to be set. From the debugging I've done it seems to occur when the ticket for the password change is provided by a different DC than the one the actual request ends up going to.

The bug at https://bugs.freedesktop.org/show_bug.cgi?id=55487 sounds like it's exactly what we're running into.

My workaround as of so far has been to retry the password setting until successful, which works but is not at all a nice workaround - and not something I particularly want to encourage with a PR;

diff --git a/lib/smart_proxy_realm_ad/provider.rb b/lib/smart_proxy_realm_ad/provider.rb
index 640fadc..d26f424 100644
--- a/lib/smart_proxy_realm_ad/provider.rb
+++ b/lib/smart_proxy_realm_ad/provider.rb
@@ -101,7 +101,24 @@ module Proxy::AdRealm
       enroll.set_host_fqdn(hostfqdn)
       enroll.set_domain_ou(@ou) if @ou
       enroll.set_computer_password(password)
-      enroll.join
+      begin
+        enroll.join
+        return true
+      rescue RuntimeError => ex
+        raise ex unless ex.message =~ /Authentication error/
+        loop do
+          begin
+            if enroll.respond_to? :update
+              enroll.update
+            else
+              enroll.password
+            end
+            return true
+          rescue RuntimeError => ex
+            raise ex unless ex.message =~ /Authentication error/
+          end
+        end
+      end
     end

     def generate_password

The question is then; is there a better workaround for such an issue that could be done? Perhaps by finding all available DCs using a DNS lookup, and then grabbing tickets for all of them?

ptulpen commented 5 years ago

Hello, we have the same issue here and at least with a quick test we can affirm that this hack works We thought about this before, but due to a lack of ruby skills could not improve it, but still we want to share the thoughts: One way to help this issue would be to store the determined DC and use for all further steps

the other idea is to use locations: at least in or environment the locations in foreman reflect the locations in AD. So it could be done something with like @domain_controller = resolver.getresource("_ldap._tcp.#{location}._sites.dc._msdcs.#{domain}", Resolv::DNS::Resource::IN::SRV).to_s

Then we only use DCs at our site

mtkraai commented 5 years ago

Similar here. In our case, it looks like it's taking AD some time to catch up before it will allow the join operation.

I modified the code from @ananace to put a ~30-second max on the loop, and sleep each time around. This seems to work for me.

@@ -103,7 +103,25 @@
       enroll.set_host_fqdn(hostfqdn)
       enroll.set_domain_ou(@ou) if @ou
       enroll.set_computer_password(password)
-      enroll.join
+      begin
+        enroll.join
+        return true
+      rescue RuntimeError => ex
+        raise ex unless ex.message =~ /Authentication error/
+        for i in 1..100
+          sleep(0.3)
+          begin
+            if enroll.respond_to? :update
+              enroll.update
+            else
+              enroll.password
+            end
+            return true
+          rescue RuntimeError => ex
+            raise ex unless i < 99 and ex.message =~ /Authentication error/
+          end
+        end
+      end
     end

     def generate_password

edit: apparently a bad copy/paste of the diff output the first time. Should be better now.

wiad commented 4 years ago

Any chance of a solution for this?

martencassel commented 4 years ago

We could try to release a new version of radcli that uses more later version of realmad/adcli (https://cgit.freedesktop.org/realmd/adcli/).

martencassel commented 4 years ago

@wiad What do i need to setup to test this issue in terms of the AD environment ?

I could setup

  1. A windows domain with 4 domain controllers. How many do we need ? Any specific configuration needed ?
  2. A linux box with radcli library installed

Test tasks

  1. With script, try to trigger radcli_join for a bunch of computers.

Post-conditions:

Do the AD environment has to be in any specific state to trigger this error ? Anything else ?

wiad commented 4 years ago

We have just started using this plugin - we have 3 DC's and I started to see the same problem as others have reported re: the authentication error message. I applied the workaround mentioned and havent seen it since. So I haven't done any extensive testing at all really, I have run in to other more pressing issues (#22)

wiad commented 4 years ago

Am I the only one seeing a problem with the workaround: when the join fails and the workaround kicks in, the computer account created in the end is missing servicePrincipalName attributes in AD. If the join works, which it most often does not, those attributes are added.

ananace commented 4 years ago

@wiad Are you using a radcli built with commit b5669ce included? (Check if it has the update method)
Without it, the workaround I wrote will only create the base entry and assign a password to it, with the linked commit then it will also apply further attributes according to the enroll object.

wiad commented 4 years ago

Are you using a radcli built with commit b5669ce included?

It's from Foremans yum repo: rubygem-radcli-1.0.0-1.el7.x86_64

It's built 2018-01-02 so I'm guessing that it is missing that commit.

wiad commented 4 years ago

I built radcli from source and now I can no longer reproduce the error where servicePrincipalName attributes are missing - great! But why is that version not packaged in Foreman? the published rubygem also seem like an older version, so only way to get the fix is to build from source?

wiad commented 4 years ago

oh, and thank you @ananace !

martencassel commented 4 years ago

Yes, the radcli gem hasn't been upgraded for a long time. We have to release a newer version of radcli that inculdes a more recent version of realmd/adcli library, and another release of the realm ad plugin that uses this release.

wiad commented 4 years ago

I ran into another problem with this workaround - when the workaround is 'activated' by the Authentication error error, the account password was not set (pwdLastSet for the account was set to 0). So the @host[otp] password used in my Foreman kickstart template didn't work.

My workaround which seem to work is to add an enroll.password:

if enroll.respond_to? :update
  enroll.update
  enroll.password  # <------ needed?
else
  enroll.password
end

Not really sure though if I'm correct in this, has anyone else seen this?

reenberg commented 4 years ago

We could try to release a new version of radcli that uses more later version of realmad/adcli (https://cgit.freedesktop.org/realmd/adcli/).

Any chance that could happen any time soon?

martencassel commented 3 years ago

There is a new release of radcli https://github.com/martencassel/radcli/releases/tag/v1.1.0 Its available on rubgems https://rubygems.org/gems/radcli/versions/1.1.0 @reenberg

martencassel commented 3 years ago

Today this plugin only connects to a single domain controller specified in the settings. The adcli library can connect to a domain with out specifying a specific domain controller as it performs dc discovery during creation of the connection. This means that the plugin can perform a DC discovery with the domain name in the settings file.

For those, that have the case that DC discovery does not work for them, we could have the optional settings for a static domain controller.

The domain controller settings would be an optional setting, and the domain name would be required.