trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.31k stars 2.97k forks source link

HDFS Kerberos auth not using expected principal #3163

Open jasper-eb opened 4 years ago

jasper-eb commented 4 years ago

I'm currently working on migrating a presto setup from 0.205 to 330 and am noticing some unexpected behavior when connecting and pulling data from HDFS. I'm not entirely sure if it's a bug or intended, but it seems more like a bug to me. I'll briefly describe the environment below:

In presto 330 the application doesn't use the full principal to authenticate (principal/host@realm), rather it's using the SAMAccount name defined in the AD. While I can work around this change in behavior I'd like to better understand why it is behaving this way. I do not see any apparent changes in the presto project causing this and wonder if it's a bug as the configured settings in the hive connector do not get respected. Here's a quick scenario of the behavior

The principal created in the AD would look as follows:

UPN logon/princpal name: prestoworker/presto-0.example.com@example.com
User SAMAccountName: example\prestoworker0

The hive connector would have the following properties for hdfs

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.presto.principal=prestoworker/presto-0.example.com@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/prestoworker.keytab

Based off this setup I would expect the workers to authenticate as prestoworker/presto-0.example.com@EXAMPLE.COM when connecting to HDFS, rather I'm seeing the worker authenticate as prestoworker0@EXAMPLE.COM according to SecurityAuth.audit on the namenode. In 0.205 it authenticates as prestoworker/presto-0.example.com@EXAMPLE.COM which matches the configuration.

Am I overlooking something, or is this behavior indeed unexpected?

electrum commented 4 years ago

Do you have multiple krb5.conf files on your machines? Are you configuring them for the main Presto server? It appears that the Hive connector relies on the value of the java.security.krb5.conf system property, but setting http.authentication.krb5.config for the main server will set that system property. I don't think this behavior has changed, but I'd like to rule out that you are using the wrong krb5.conf file.

electrum commented 4 years ago

I have also not been able to find any related code changes or explanation for this behavior.

jasper-eb commented 4 years ago

Just one krb5.conf written to /etc/krb5.conf with two realms in it, one for the AD and one for the joined realm. For the coordinator we set http.authentication.krb5.config=/etc/krb5.conf in our config.properties, but not for the workers. We don't set java.security.krb5.conf in the workers, but from what I can tell it should be able to locate the file. The configuration hasn't changed in the upgrade, but I'll share a masked version of the krb5.conf blow to ensure it can be ruled out.

[libdefaults]
  renew_lifetime = 7d
  forwardable = true
  default_realm = NATIVE.REALM
  ticket_lifetime = 24h
  dns_lookup_realm = false
  dns_lookup_kdc = false
  default_ccache_name = /tmp/krb5cc_%{uid}

[domain_realm]
   native.realm = NATIVE.REALM
  .native.realm = NATIVE.REALM
.joined.realm = JOINED.REALM
 joined.realm = JOINED.REALM

[logging]
  default = FILE:/var/log/krb5kdc.log
  admin_server = FILE:/var/log/kadmind.log
  kdc = FILE:/var/log/krb5kdc.log

[realms]
  NATIVE.REALM = {
    admin_server = hostname
    kdc = hostname-0
    kdc = hostname-1
    kdc = hostname-2
  }

JOINED.REALM = {
  kdc = hostname:88
  admin_server = hostname:749
  default_domain = joined.realm
}

Let me know if there's any other info you need

jasper-eb commented 4 years ago

One thing I forgot to mention is that the machines the cluster is running on are ubuntu 18.04 on the 330 environment vs 16.04 on the 0.205. There are some upgrades to the krb5 packages, though I don't see how that would impact the change in principal used by the application.

findepi commented 4 years ago

@jasper-eb is the Java version the same on both clusters?

can you experiment with running .205 version on your newer cluster (ie Ubuntu 18.04) to rule out JDK/OS/krb5.conf being the cause?

jasper-eb commented 4 years ago

It is not, the new cluster runs on java 11 (openjdk dist), the old on java 8 (oracle dist).

Sadly that's not possible. I tried moving that setup to openjdk before but back then there was code limiting the java distribution to certain venders. If it did not detect one of the valid vendors it wouldn't run.

I could flip it around and see if I get 330 working on the old setup and see as what principal in authenticates as if that's helpful.