hautreux / auks

Kerberos credential support for batch environments
Other
20 stars 18 forks source link

Auks, Slurm and OpenAFS #15

Open dietrichliko opened 8 years ago

dietrichliko commented 8 years ago

Hi!

Thanks for your effort. This is less an issue, more a question. I am setting up SLURM and AUKS on an OpenAFS system on a small site in Vienna. Evidently auks is keeping the Kerberos token alive, but in case of AFS one has to convert the Kerberos token into an AFS token using aklog on a regular basis.

I succeed setting up the initial AFS token, using the pam_afs_session module, which solves this problem also during a login with SSH. But now it is required to renew the AFS token on a regular basis ... which requires detaching another process.

I see that you have a solution for that problem with the auks spank plugin, so I could write my own plugin to solve that. But I have the impression that I am trying the reinvent the wheel. Is there a standard solution on how to solve this issue for AFS ?

Cheers, Dietrich

hautreux commented 8 years ago

A generic logic to deal with credential renewal is using the pam_afs_session as you suggested. pam_afs_session implements the pam_setcred logic and can thus provides automatic renewal when combined with a calling program (like sshd) that supports that (like sshd with GSSAPIStoreCredentialsOnRekey=yes when available).

Concerning auks/slurm, the auks spank plugin is forking an helper process ("auks -R loop") that is responsible for renewing the credential during the job execution. It only deals with kerberos credential renewal and has no support for PAM or other things like that. It could be done, but it is not currently done, as I never have to deal with AFS (I am using NFSv4.x+kerberos).

You have multiple options if you want that :

The second approach could be done easily. I can give you some hints if you want to go this way and propose a patch.

dietrichliko commented 8 years ago

Hi!

I guess i will follow the second approach, as you suggest.

Using the pam_afs_session approch might be conceptually better, but then its a one time hack for myself. Also the longer term future of AFS on my site is not clear ...

Thanks for any further suggestion.

Cheers, Dietrich

dietrichliko commented 8 years ago

Hi!

I did not manage to set the OpenAFS token from the context of the auks daemon, which is renewing the kerberos token. I have to call the "aklog" command from the SlurmStepCtld. It is required that the AFS token has to be available also for this daemon, as it handles the output file.

This requires evidently patching Slurm, but its very straight forward. Basically I am creating an additional thread, that is calling aklog every 20 mins ...

The logic to start and stop the thread is very close to handling of PAM by slurm. It would be conceptually better, if one would mange to use PAM and pam_afs_session to renew the token, but due to my lack of knowledge PAM I did not manage to renew the tokens with pam_setcred and the renew option ....

In any case I have a satistfactory solution from my point of view. The solution is rather a hack, then a proper patch ....

Cheers, Dietrich

hautreux commented 8 years ago

Great, if/when I find time to add an helper task in auks spank plugin, I will let you know.

Regards

Le jeu. 2 juin 2016 11:46, Dietrich Liko notifications@github.com a écrit :

Hi!

I did not manage to set the OpenAFS token from the context of the auks daemon, which is renewing the kerberos token. I have to call the "aklog" command from the SlurmStepCtld. It is required that the AFS token has to be available also for this daemon, as it handles the output file.

This requires evidently patching Slurm, but its very straight forward. Basically I am creating an additional thread, that is calling aklog every 20 mins ...

The logic to start and stop the thread is very close to handling of PAM by slurm. It would be conceptually better, if one would mange to use PAM and pam_afs_session to renew the token, but due to my lack of knowledge PAM I did not manage to renew the tokens with pam_setcred and the renew option ....

In any case I have a satistfactory solution from my point of view. The solution is rather a hack, then a proper patch ....

Cheers, Dietrich

You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/hautreux/auks/issues/15#issuecomment-223245385, or mute the thread https://github.com/notifications/unsubscribe/AA2pp5rnGZ0Wnu8GnhnYxPhKbYxCBUV0ks5qHqaMgaJpZM4IgH7y .

feltstykket commented 6 years ago

Hi Dietrich,

I just finished setting up auks on a cluster. I can get into my home directory if I aklog from sbatch, but I can't submit jobs. I see in your comments that you mention having to put something in SlurmStepCtld but I am not finding reference to that option in slurm.conf. Could you share a bit more of how you got it working?

Thanks!

dietrichliko commented 6 years ago

Hi Richard!

Auks is concerned with the Kerberos token, but for OpenAFS you need to run as well aklog at a suited place. The auks plugin is using an environment variable to transfer information on the token to the batch process. But this does not work for the OpenAFS token. A person with more knowledge of the OpenAFS token handling might succeed to do that.

For me the solution was to patch slurmctld and call aklog from the process, which is the parent of the batch job. This requires to patch the slurm source code in a quite straightforward way. This is a hack, but it works very well for me.

If it helps you, I can dig up the patch and send it you.

Cheers, Dietrich

feltstykket commented 6 years ago

Hi Dietrich,

Thanks! That patch would be much appreciated!

belfhi commented 6 years ago

@dietrichliko @feltstykket did you manage to get AFS working in a SLURM job? I am trying to get this set up but am not being successful at the moment.

dietrichliko commented 6 years ago

Hi Johannes!

Yes. It is not a nice solution, as it requires to patch slurm.

AFS tokens are somehow stored in the kernel and are not accessible to mere mortals. And openafs source code is very hard to read. The most easy solution is to call aklog in a thread on the level of the main process, which is a slurm process. Evidently someone could figure out how to do that with a plug-in, but to my knowledge it has not been done...

Anyway it works for me. To my knowledge they use the same patch at DESY/Zeuthen.

If you need, I can send you the modifications.

Cheers, Dietrich

On Sep 26, 2018, at 10:32 AM, Johannes Reppin notifications@github.com<mailto:notifications@github.com> wrote:

@dietrichlikohttps://github.com/dietrichliko @feltstykkethttps://github.com/feltstykket did you manage to get AFS working in a SLURM job? I am trying to get this set up but am not being successful at the moment.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hautreux/auks/issues/15#issuecomment-424631837, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEKX2QGMOLfMUt0jRmr8Is039vm1Q67iks5uezuigaJpZM4IgH7y.

feltstykket commented 5 years ago

Hi Dietrich,

I am going to upgrade to Slurm 18.08 tomorrow, could you send me the patch? I would like to try it.

Thanks! Richard

dietrichliko commented 5 years ago

Hi Richard!

here my patch ... good luck

Dietrich afstoken.patch.zip