requests / requests-kerberos

An authentication handler for using Kerberos with Python Requests.
Other
289 stars 101 forks source link

is service@host a bug? #144

Closed oblat closed 3 years ago

oblat commented 4 years ago

I was getting the "server not found in kerberos database" error, but since smbclient -k was working fine (from rhel NOT in AD to a win10 desktop in AD) I decided to locate the problem in pywinrm, which seems to be that it's using service@host@domain instead of the normal looking service/host@domain. Could this be an overlooked bug?

Code reads: kerb_spn = "{0}@{1}".format(self.service, kerb_host)

Should be: kerb_spn = "{0}/{1}".format(self.service, kerb_host)

It has been claimed that it's necessary for both computers to be in AD, but that's obviously false, because smbclient -k doesn't require it, and pywinrm works like a breeze after I changed the @ to /.

That day when I was trying to solve the problem, I had the impression that many people encountered the error I got and some even gave up using Kerberos. I think the @ is causing the problem, but unfortunately I don't have time to delve into the details of Kerberos.

alv000h commented 4 years ago

@oblat I have the same problem here I have check that these snippet of code stay there from 2016 without modifications. I am having problems with this because I am doing an ansible upgrade.

SPN in this format seems correct: HTTP@www.example.org (self.service@kerb_host)

The problem appears when you need to authenticate in windows using winrm + kerberos, in windows default service SPN is in this manner: HTTP/yourserver.your.realm.com which is the same as: HTTP/yourserver.your.realm.com@YOUR.REALM.COM

And it is very strange, it appears that something changes in code (the way ansible calls Protocol or Transport) that make this not work any more... which is very fustrating (i have lost a lot of time with this error until I discovered the root cause)

To sum up, if you configure winrm with this options It should run OK, but I think, this "obscure" manipulation of the service SPN should be done by ansible automatically:

ansible_user=youruser@YOUR.REALM.COM ansible_winrm_server_cert_validation=ignore ansible_connection=winrm

Manipulating SPN to make this work!

ansible_winrm_service='HTTP/yourserver.your.realm.com' ansible_winrm_kerberos_hostname_override=YOUR.REALM.COM

Now the SPN will be like this: HTTP/yourserver.your.realm.com@YOUR.REALM.COM

@nitzmahone could you help us with this issue?

jborean93 commented 4 years ago

Looking at the pykerberos code the principal that is passed in by requests-kerberos (service@hostname) will be using gss_krb5_nt_service_name on the call to gss_import_name. This is otherwise known as GSS_C_NT_HOSTBASED_SERVICE which is documented like

specifies a service that is related to a particular host and is specified as service@host. For the Kerberos mechanism, the service name is converted to service/canonical-name@kerberos-realm. The canonical-name is obtained by doing a DNS lookup for the supplied host name and obtaining the canonical host name from the name server.

And for MIT krb5 (what most Linux distros use)

The value should be a string of the form service or service@hostname. This is the most common way to name target services when initiating a security context, and is the most likely name type to work across multiple mechanisms.

So the format that pykerberos uses is correct and is purposely different from the Windows SPN format HTTP/fqdn@REALM which is the GSS_KRB5_NT_PRINCIPAL_NAME in GSSAPI Kerberos. There is a fallback that if the SPN passed in contains / then it will use GSS_C_NO_OID which is basically have the Kerberos implementation parse it which is why your settings also work.

Changing the behaviour of requests-kerberos to use the Kerberos principal name format of HTTP/fqdn@REALM will be problematic for a few reasons

  1. We don't know the realm, or at least cannot easily determine it so adding the @REALM part is hard
  2. Technically we can just do HTTP/fqdn or HTTP/fqdn@ and the Kerberos implementation will lookup the realm based on the /etc/krb5.conf settings but they are not always available or will result in a realm being found
  3. This is a stretch but the Kerberos principal name format is only valid for Kerberos. If this library ever adds support for SPNEGO then the host based service format service@fqdn

As for why it was working before the upgrade but failed after I'm not sure why. I personally don't think it would be anything to do with Ansible or any of the Python libraries it uses because the code for all this has been quite static over the past few years. I'm not aware of any changes that has happened around all the stuff that handles this.

oblat commented 4 years ago

It's been a long time, but if I recall correctly the problem was not that service@host wasn't working. The problem was that something made service@host change to (probably) service@host@fqdn, and that is not valid. That's why changing the first @ into / made things work.

Sent from Yahoo Mail on Android

oblat commented 4 years ago

Sorry, I did mean service@host@realm. It's as if something didn't accept that service@hostname should be valid and therefore did that automatic adding of the realm, as in your suggestion 2? Is it due to some confusion between hostname and fqdn?

Sent from Yahoo Mail on Android

jborean93 commented 4 years ago

The GSSAPI spec for GSS_C_NT_HOSTBASED_SERVICE is to be in the format service@hostname. The actual conversion to the SPN is done inside the Kerberos implementation as it has all the information it needs to lookup/derive the REALM part and convert it to the SPN format service/hostname@realm. There's all sorts of things MIT krb5 or Heimdal does to convert the SPN to the Kerberos principal name, by specifying the SPN in the format service/hostname@realm you are effectively bypassing all that logic by being more explicit.

As for why this may be failing for some environments I'm not sure. The logic around converting the generic UPN to the format that Kerberos requires is all done in the GSSAPI library and from my understanding is quite complex, and sometimes not even uniform between implementations. To add to that complexity there are some options you can set in /etc/krb5.conf that control the canonicalisation of the hostname/realm like dns_canonicalize_hostname, rdns, default_realm. Ultimately Kerberos is a tricky beast that can be difficult to set up and has all sorts of edge cases. Ultimately I don't think the behaviour of requests-kerberos is a bug, we follow the format expected of us by GSSAPI and by using the explicit Kerberos format we can break existing scenarios like I've pointed out before.

alv000h commented 4 years ago

It's very strange... I update with additional info.

I have running now ansible 2.1.1.1 in production. This version didn't have full kerberos support yet and it works like in ansible_winrm_kinit_mode=manual mode. (Thats means that you have to do "kinit ADuser@YOUR.REALM.COM" manually before executing ansible cmd)

The point is now I have to migrate to ansible 2.8.12 (with venv) and when I configure the AD account (ansible_user=ADuser@YOUR.REALM.COM and ansible_password=***) this simply should work like in old versions but it dont.

Thats why i think there will be some changes/quirks in ansible or request-kerberos that break compatibility...

In ansible 2.8.12 kinit is done correctly and spn format is: HTTP@yourserver.your.realm.com and this format dont works with WSMAN endpoint in windows

The flow is:

Ticket granted ok (kinit) When ansible set the endpoint and try to send ticket to https WSMAN it fails with remote kerberos error: Server not found in Kerberos database

This typically ocurs when the service principal is not present in target server (this is easy to fix with "setspn -A HOST/yourserver.your.realm.com yourserver") but it is not true in this case because service spn is correctly configured in target server

@nitzmahone knows something more (much more) about ansible winrm + kerberos implementantion. Maybe can help...

nitzmahone commented 4 years ago

We've both traced the defaults back through to ancient versions of requests-kerberos and pykerberos- it doesn't appear that it's ever had the ability to generate a non (service)@(host) SPN. The default service at one level or another has always been HTTP- we used to just not pass it (but requests-kerberos still defaulted to HTTP in that case), and pywinrm 0.3.0 started passing HTTP explicitly to requests-kerberos if not told otherwise... So I'm not seeing any obvious behavior change looking at the code.

jborean93 commented 4 years ago

I have running now ansible 2.1.1.1 in production.

Going from 2.1.* to 2.8 is a massive jump, 2.1.1.0 was released in July 2016 which is over 4 years ago. There will be without doubt a few changes made to each of these code bases to introduce new features over that time period. Each one of these could have changed something, maybe the default service name was modified but it will be hard to tell.

In ansible 2.8.12 kinit is done correctly and spn format is: HTTP@yourserver.your.realm.com and this format dont works with WSMAN endpoint in windows`

Kinit has nothing to do with the target SPN. It acquires the Ticket Granting Ticket (TGT) for the username you have specified and the target SPN is not used at all in this part. It's also done before requests-kerberos and is achieved in Ansible by just calling kinit <username>.

*When ansible set the endpoint and try to send ticket to https WSMAN it fails with remote kerberos error: Server not found in Kerberos database

This is done after the kinit part by pykerberos (called by requests-kerberos). It uses the TGT to generate a Ticket Granting Service (TGS) by sending the SPN to the domain controller. The DC uses that SPN to find the Kerberos principal, in this case the computer object, that has been registered against that SPN. The conversion from the hostbased service http@hostname is all done in GSSAPI which is outside even pykerberos. It has it's own logic to convert the hostbased format to the Kerberos principal and by doing it yourself you are short circuiting this logic.

The trouble with using the target name in the Kerberos principal format is it requires knowledge we may not necessarily have and for us to implemented it properly we would need to do all sorts of DNS lookups or read through the GSSAPI config which is not ideal. In the end we do the proper thing which is rely on GSSAPI to process the name. If that doesn't work then there's a config/environment issue on the GSSAPI layer. The workaround as you've seen is to specify the Kerberos principal format that you've seen to bypass this logic in GSSAPI. In short there's nothing I can see on the ansible/pywinrm/requests-kerberos side that would cause this problem.

Ultimately to track down why this might be happening you will need to use some Wireshark requests to sniff port 88 between the client and the domain controller to see what SPN that it is using. From there you can see exactly what the conversion from the hostbased service to the Kerberos principal that the GSSAPI framework did for you and potentially determine what it is missing.

nitzmahone commented 4 years ago

If you don't believe us there, you can also take the kinit stuff out of the picture by just not setting ansible_user/ansible_pass, or explicitly setting ansible_winrm_kerberos_mode=manual to get the previous behavior (using whatever existing default TGT is present rather than requesting a private one). If you're really wanting to try to figure it out, it might also work with ancient versions of pywinrm (<0.3.0) and requests-kerberos (<0.9.0), but YMMV.

alv000h commented 4 years ago

@nitzmahone @jborean93 I believe you all of course! And sorry if my statements sound like I have any doubts in your diagnose (I am not a native speaker).

I have full trust on you because this issue tracker and you guys have helped me a lot in the past

Furtermore I really apreciate code review and GSSAPI explanation of above.

Its only that 5 years ago when I was triying/switching ansible 2.1, 2.2, 2.3 in a daily basis kerberos authentication works like charm without configure nothing "strange"

After some investigation in the code and in stackoverflow (including ansible_winrm_kerberos_mode=manual) didnt clarify the problem and that is why I post on this previous opened issue

I think I will setup a new VM tomorrow and try a clean ansible installation to discard problems in my current system

nitzmahone commented 4 years ago

@alv000h no worries, I'm sure your English is infinitely better than my (anything other than English) ;) There are definitely a ton of moving parts involved here, but one way or another, you should be able to get something working with minimal effort- this code has been pretty stable for several years and used by a lot of people.

jborean93 commented 3 years ago

Closing as per the above.