payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
18 stars 25 forks source link

Support for signing git commits #449

Open jo-basevi opened 6 days ago

jo-basevi commented 6 days ago

As brought up in this forum post - https://forum.access-hive.org.au/t/payu-runlog-and-git-commit-signing/2178, there can be errors when global sign commits is enabled on NCI, as payu run PBS job uses system git with an older openssh version than the library installed in hh5 and vk83 payu environments.

This error can be worked around by using passphrase-less signing keys with a public/private key pair on NCI, and configuring git config to use an ssh-keygen executable with version >= 8.2p1. For example, configuring git config --global gpg.ssh.program /g/data/hh5/public/apps/miniconda3/envs/analysis3/bin/ssh-keygen when using the hh5's conda environment, or git config --global gpg.ssh.program /g/data/vk83/apps/payu/1.1.3/bin/ssh-keygen when using vk83's payu environments.

To support commit signing, so it can work with signing keys that have been forwarded from the host machine, and keys that have a passphrase, it would be ideal if payu could commit runlog on a login node. This would require experiment setup to run before the pbs job submission to generate the manifests. Currently, the commits are added only after the model run command has been executed- this is good for ensuring the model run command actually ran and there weren't any errors prior to it.

Other cons to running experiment setup on the login nodes, is that in the time elapsed between running setup and run job running, something in the configuration or work directory could change. Also, running experiment on the pbs job sets up checks that the compute job is able to access the various projects needed to run.. There's probably other reasons to keeping the experiment setup as part of the run job on the compute nodes that I am not aware of.

Another option could if be if signing a commit errors out on a pbs job, then revert to just committing an unsigned commit?

atteggiani commented 6 days ago

I don't know the specifics of the payu run PBS job, but I think the solution would be simply to run:

git config --local gpg.ssh.program /g/data/vk83/apps/payu/1.1.3/bin/ssh-keygen

within the PBS job, from inside the repo where payu commits to. The use of --local is better to prevent overriding the git global settings for the user (but if payu needs to commit to multiple repositories within the same PBS job, then I would use --global instead of running the line above in all repo folders). This will make sure that, if there is any signing commit required, the ssh-keygen executable used will always be one valid for signing commits (and also seen by the PBS job since I believe gdata/vk83 would be added to the storage of the payu PBS job).

atteggiani commented 6 days ago

Actually, an even better way to prevent overriding user's git options, would be to add the git gpg.ssh.program /g/data/vk83/apps/payu/1.1.3/bin/ssh-keygen config option only in the commit command using the -c option (check this answer for details).

This means adding the commit through:

git -c "gpg.ssh.program=/g/data/vk83/apps/payu/1.1.3/bin/ssh-keygen" commit ...
jo-basevi commented 5 days ago

Yeah that is really neat. An issue is that it's hardcoding a deployed payu environment path into the payu code. So could run into issue if /g/data/vk83/apps/payu/1.1.3/bin/ssh-keygen no longer exists. Alternatively could pass in an environment variable with ssh-keygen path to the payu job submission, and then use that path for the gpg.ssh.program. That's still assuming that an environment with an updated openssh is loaded, which is the case for hh5 and vk83 but could be different for other installs of payu.

It still has an issue that the signing key needs to be passphrase-less and have a public/private key pair on NCI (unless there's a way to forward ssh keys to compute jobs?). So the signing key really should be separate from the github authentication key.

aidanheerdegen commented 5 days ago

I don't think hard-coding paths to tools like this in payu is a good idea. As @jo-basevi pointed out, it makes it fragile and very specific.

At this stage I think very clear and explicit instructions for how to implement this is sufficient. If the instructions are clear that the ssh-keygen executable should be in the same conda environment as payu or the same project, or in a users home directory, then it should be accessible in a PBS job.

atteggiani commented 5 days ago

Yeah it makes sense not to hardcode any specific path into payu.

If the instructions are clear that the ssh-keygen executable should be in the same conda environment as payu or the same project, or in a users home directory, then it should be accessible in a PBS job.

I agree, but it is still prone to errors in case people forget or simply don't know.

What happens from a payu workflow standpoint if you get a commit signing error (because of ssh-keygen)? Does the workflow stop? Can that affect the resubmission/completion of running jobs?

If that is the case, another solution might be to catch any potential signed-commit error and, instead of failing, throw a warning and issue the commit without signing it.

jo-basevi commented 5 days ago

When there's git commit error, it doesn't stop the workflow as any exceptions raised by the commit subprocess call are caught. There will just be some error messages in the logs. If it does error out, we could attempt to an unsigned commit e.g. git -c "commit.gpgsign=false" commit ... The warning could also suggest if they want the runlog commits signed to run git commit --amend -S on the login node.