Open jo-basevi opened 6 days ago
I don't know the specifics of the payu run
PBS job, but I think the solution would be simply to run:
git config --local gpg.ssh.program /g/data/vk83/apps/payu/1.1.3/bin/ssh-keygen
within the PBS job, from inside the repo where payu
commits to.
The use of --local
is better to prevent overriding the git global settings for the user (but if payu
needs to commit to multiple repositories within the same PBS job, then I would use --global
instead of running the line above in all repo folders).
This will make sure that, if there is any signing commit required, the ssh-keygen
executable used will always be one valid for signing commits (and also seen by the PBS job since I believe gdata/vk83
would be added to the storage of the payu
PBS job).
Actually, an even better way to prevent overriding user's git options, would be to add the git gpg.ssh.program /g/data/vk83/apps/payu/1.1.3/bin/ssh-keygen
config option only in the commit command using the -c
option (check this answer for details).
This means adding the commit through:
git -c "gpg.ssh.program=/g/data/vk83/apps/payu/1.1.3/bin/ssh-keygen" commit ...
Yeah that is really neat. An issue is that it's hardcoding a deployed payu environment path into the payu code. So could run into issue if /g/data/vk83/apps/payu/1.1.3/bin/ssh-keygen
no longer exists. Alternatively could pass in an environment variable with ssh-keygen
path to the payu job submission, and then use that path for the gpg.ssh.program
. That's still assuming that an environment with an updated openssh
is loaded, which is the case for hh5
and vk83
but could be different for other installs of payu.
It still has an issue that the signing key needs to be passphrase-less and have a public/private key pair on NCI (unless there's a way to forward ssh keys to compute jobs?). So the signing key really should be separate from the github authentication key.
I don't think hard-coding paths to tools like this in payu
is a good idea. As @jo-basevi pointed out, it makes it fragile and very specific.
At this stage I think very clear and explicit instructions for how to implement this is sufficient. If the instructions are clear that the ssh-keygen
executable should be in the same conda environment as payu
or the same project, or in a users home directory, then it should be accessible in a PBS job.
Yeah it makes sense not to hardcode any specific path into payu
.
If the instructions are clear that the
ssh-keygen
executable should be in the same conda environment aspayu
or the same project, or in a users home directory, then it should be accessible in a PBS job.
I agree, but it is still prone to errors in case people forget or simply don't know.
What happens from a payu
workflow standpoint if you get a commit signing error (because of ssh-keygen)? Does the workflow stop? Can that affect the resubmission/completion of running jobs?
If that is the case, another solution might be to catch any potential signed-commit error and, instead of failing, throw a warning and issue the commit without signing it.
When there's git commit error, it doesn't stop the workflow as any exceptions raised by the commit subprocess call are caught. There will just be some error messages in the logs. If it does error out, we could attempt to an unsigned commit e.g.
git -c "commit.gpgsign=false" commit ...
The warning could also suggest if they want the runlog commits signed to run git commit --amend -S
on the login node.
As brought up in this forum post - https://forum.access-hive.org.au/t/payu-runlog-and-git-commit-signing/2178, there can be errors when global sign commits is enabled on NCI, as
payu run
PBS job uses system git with an older openssh version than the library installed inhh5
andvk83
payu environments.This error can be worked around by using passphrase-less signing keys with a public/private key pair on NCI, and configuring git config to use an
ssh-keygen
executable with version >= 8.2p1. For example, configuringgit config --global gpg.ssh.program /g/data/hh5/public/apps/miniconda3/envs/analysis3/bin/ssh-keygen
when using thehh5
's conda environment, orgit config --global gpg.ssh.program /g/data/vk83/apps/payu/1.1.3/bin/ssh-keygen
when usingvk83
's payu environments.To support commit signing, so it can work with signing keys that have been forwarded from the host machine, and keys that have a passphrase, it would be ideal if payu could commit runlog on a login node. This would require experiment setup to run before the pbs job submission to generate the manifests. Currently, the commits are added only after the model run command has been executed- this is good for ensuring the model run command actually ran and there weren't any errors prior to it.
Other cons to running experiment setup on the login nodes, is that in the time elapsed between running setup and run job running, something in the configuration or work directory could change. Also, running experiment on the pbs job sets up checks that the compute job is able to access the various projects needed to run.. There's probably other reasons to keeping the experiment setup as part of the run job on the compute nodes that I am not aware of.
Another option could if be if signing a commit errors out on a pbs job, then revert to just committing an unsigned commit?