hautreux / auks

Kerberos credential support for batch environments
Other
20 stars 18 forks source link

Auksd & SLURM el7 #39

Closed jp-hudson closed 2 years ago

jp-hudson commented 4 years ago

Hey there,

Hope this is the right place for this, if not please tell me.

I am trying to bring up auks with our newly installed slurm implementation but I am having a few problems getting the initial services started. I was hoping you could assist, or help point me in the right direction.

Setting it up on: Mgmt node Login node Compute node

Installing first on mgmt node:

Installed auks via RPM's:

auks-0.4.0-1.x86_64.rpm auks-debuginfo-0.4.0-1.x86_64.rpm auks-devel-0.4.0-1.x86_64.rpm auks-slurm-0.4.0-1.x86_64.rpm

And enabled the auks plugin by adding this to plugstack.conf:

optional /usr/lib64/slurm/auks.so default=enabled spankstackcred=yes minimum_uid=1024

Inside the auks.conf file I have configured the:

PrimaryHost PrimaryPrincipal

No secondary

Inside auks.acl file (I am a bit confused here) I have the admin line setup and currently it is setup as myself. I know that this is not correct, should this be the slurm user? Also, I am not entirely sure what to set for the guest and user role, or if they need to be defined.

When trying to start the auksd service it hangs on activating and eventually fails. Looking at the auks.log it shows a failure at the krb5_recvauth step:

Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: connection authentication context initialisation succeed
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: authentication context addrs set up succeed
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: default kstream initialisation succeed
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: kstream basic initialisation succeed
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: keytab initialisation succeed
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: server kstream initialisation succeed
Wed May 20 10:32:08 2020 [INFO3] [euid=0,pid=31256] worker[6] : krb5 stream successfully initialized for socket 4
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: authentication failed : Software caused connection abort
Wed May 20 10:32:08 2020 [INFO2] [euid=0,pid=31256] worker[6] : authentication failed on socket 4 (10.232.128.65) : krb5 stream : recvauth stage failed (server side)
Wed May 20 10:32:08 2020 [INFO3] [euid=0,pid=31256] worker[6] : incoming socket 4 processing failed
Wed May 20 10:32:11 2020 [INFO3] [euid=0,pid=31256] dispatcher: incoming connection (3) successfully added to pending queue
Wed May 20 10:32:11 2020 [INFO3] [euid=0,pid=31256] worker[8] : incoming socket 3 successfully dequeued
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: local endpoint stream 3 informations request succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: remote endpoint stream 3 informations request succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: remote host is 10.232.128.65
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: context initialization succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: connection authentication context initialisation succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: authentication context addrs set up succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: default kstream initialisation succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: kstream basic initialisation succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: keytab initialisation succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: server kstream initialisation succeed
Wed May 20 10:32:11 2020 [INFO3] [euid=0,pid=31256] worker[8] : krb5 stream successfully initialized for socket 3
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: authentication failed : Software caused connection abort
Wed May 20 10:32:11 2020 [INFO2] [euid=0,pid=31256] worker[8] : authentication failed on socket 3 (10.232.128.65) : krb5 stream : recvauth stage failed (server side)

Aukspriv does not seem happy either and complains that it is unable to get the ccache for the host using the keytab file which I "believe" is a good keytab file but my kerberos knowledge is not very good.

unable to get ccache for host ____ using ktfile /etc/krb5.keytab : kinit: Client not found in Kerberos database while getting initial credentials.

Any suggestions or pointers on where to be looking to resolve this would be so helpful.

Best,

John Hudson

DriesVerachtert commented 4 years ago

John, I'm definitely no expert but maybe first check if kerberos itself is working correctly? Does klist -kt indicates you have a valid keytab? Have a look at this presentation: https://slurm.schedmd.com/slurm_ug_2012/auks-tutorial.pdf It contains a bunch of interesting commands to validate various steps of the configuration. Kind regards, Dries

hautreux commented 4 years ago

I agree, please look at the slides, it should help you to understand how to make things work and do the intermediate checks. Without the content of your configuration files, and more Kerberos related conf, it will be difficult to help on that.

bcchrisupp commented 4 years ago

Hey there,

Hope this is the right place for this, if not please tell me.

I am trying to bring up auks with our newly installed slurm implementation but I am having a few problems getting the initial services started. I was hoping you could assist, or help point me in the right direction.

Setting it up on: Mgmt node Login node Compute node

Installing first on mgmt node:

Installed auks via RPM's:

auks-0.4.0-1.x86_64.rpm auks-debuginfo-0.4.0-1.x86_64.rpm auks-devel-0.4.0-1.x86_64.rpm auks-slurm-0.4.0-1.x86_64.rpm

And enabled the auks plugin by adding this to plugstack.conf:

optional /usr/lib64/slurm/auks.so default=enabled spankstackcred=yes minimum_uid=1024

Inside the auks.conf file I have configured the:

PrimaryHost PrimaryPrincipal

No secondary

Inside auks.acl file (I am a bit confused here) I have the admin line setup and currently it is setup as myself. I know that this is not correct, should this be the slurm user? Also, I am not entirely sure what to set for the guest and user role, or if they need to be defined.

When trying to start the auksd service it hangs on activating and eventually fails. Looking at the auks.log it shows a failure at the krb5_recvauth step:

Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: connection authentication context initialisation succeed
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: authentication context addrs set up succeed
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: default kstream initialisation succeed
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: kstream basic initialisation succeed
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: keytab initialisation succeed
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: server kstream initialisation succeed
Wed May 20 10:32:08 2020 [INFO3] [euid=0,pid=31256] worker[6] : krb5 stream successfully initialized for socket 4
Wed May 20 10:32:08 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: authentication failed : Software caused connection abort
Wed May 20 10:32:08 2020 [INFO2] [euid=0,pid=31256] worker[6] : authentication failed on socket 4 (10.232.128.65) : krb5 stream : recvauth stage failed (server side)
Wed May 20 10:32:08 2020 [INFO3] [euid=0,pid=31256] worker[6] : incoming socket 4 processing failed
Wed May 20 10:32:11 2020 [INFO3] [euid=0,pid=31256] dispatcher: incoming connection (3) successfully added to pending queue
Wed May 20 10:32:11 2020 [INFO3] [euid=0,pid=31256] worker[8] : incoming socket 3 successfully dequeued
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: local endpoint stream 3 informations request succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: remote endpoint stream 3 informations request succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: remote host is 10.232.128.65
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: context initialization succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: connection authentication context initialisation succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: authentication context addrs set up succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: default kstream initialisation succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: kstream basic initialisation succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: keytab initialisation succeed
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: server kstream initialisation succeed
Wed May 20 10:32:11 2020 [INFO3] [euid=0,pid=31256] worker[8] : krb5 stream successfully initialized for socket 3
Wed May 20 10:32:11 2020 [INFO4] [euid=0,pid=31256] auks_krb5_stream: authentication failed : Software caused connection abort
Wed May 20 10:32:11 2020 [INFO2] [euid=0,pid=31256] worker[8] : authentication failed on socket 3 (10.232.128.65) : krb5 stream : recvauth stage failed (server side)

Aukspriv does not seem happy either and complains that it is unable to get the ccache for the host using the keytab file which I "believe" is a good keytab file but my kerberos knowledge is not very good.

unable to get ccache for host ____ using ktfile /etc/krb5.keytab : kinit: Client not found in Kerberos database while getting initial credentials.

Any suggestions or pointers on where to be looking to resolve this would be so helpful.

Best,

John Hudson

Hey bloodbuzz,

Just wondering if you were ever able to figure this out, I'm running into something similar with my setup on CentOS 8. https://github.com/hautreux/auks/issues/45

hautreux commented 2 years ago

closing this, reopen if necessary, but seemed to be related to a krb5 conf / setup issue