zhaofengli / colmena

A simple, stateless NixOS deployment tool
https://colmena.cli.rs
MIT License
1.23k stars 66 forks source link

Using colmena with ed25519-sk keys does not work with multiple hosts #216

Open pelme opened 4 months ago

pelme commented 4 months ago

Using ed25519-sk keys makes it possible to authenticate with one host, but all others fails without being given the chance to press the key:

$ colmena exec --on polly,stanley uptime
[INFO ] Using flake: git+file:///XXX
[INFO ] Enumerating nodes...
[INFO ] Selected 2 out of 8 hosts.
        ✅ 5s All done!
  polly ❌ 1s Failed: Child process exited with error code: 255
stanley ✅ 5s Succeeded
[ERROR] Failed to complete job on polly - Last 5 lines of logs:
[ERROR]  created)
[ERROR]    state) Running
[ERROR]   stderr) sign_and_send_pubkey: signing failed for ED25519-SK "XXX/.ssh/yubikey5c": device not found
[ERROR]   stderr) root@polly: Permission denied (publickey,password,keyboard-interactive).
[ERROR]  failure) Child process exited with error code: 255

Running with --parallel 1 works.

Workaround to apply a configuration to multiple hosts with parallelism: Enable ControlMaster in the ssh config and establish connections with each host before running colmena. Running colmena exec --parallel 1 true does the trick. Then it is possible to use colmena apply with parallelism.

Using too many hosts with a security key may not be practical since each require a touch but would it be possible to handle this more gracefully even with parallelism enabled? I am not sure what exactly the ideal solution would be but it could be nice if it was less suprprising? Would it be possible to touch the key sequentially for each host?

pelme commented 2 months ago

I did run this on macOS, did not check if it is the same on Linux.

matthew-salerno commented 1 month ago

As an alternative option: One can use a regular ed25519 key but age encrypted with age-yubikey-plugin. The trick is to utilize the ssh-agent to store the decrypted key in memory without letting it touch the disk:

rage --decrypt ~/.ssh/id_system_deployer.age -i ./yubikey-ident.pub | ssh-add -

This is obviously less secure than an ed25519-sk key - you also lose portability - but for me it makes for a nice compromise. Here's a full script I made that wraps colmena:

#! /usr/bin/env bash

cleanup () {
    ssh-add -d ~/.ssh/id_system_deployer.pub
    # for my own sanity, so I know the key was removed from ssh-agent
    if [ $? -ne 0 ] ; then
        >&2 echo "Error cleaning up key!"
    else
        echo "Cleaned up key"
    fi
}

interrupted () {
    # unload the key when interrupted
    cleanup
    trap - SIGINT SIGHUP SIGABRT SIGTERM SIGQUIT
    exit 1
}
# setup interrupts that will unload the key
trap interrupted SIGINT SIGHUP SIGABRT SIGTERM SIGQUIT
# 1 hour timeout just in case an interrupt isn't caught (e.g. SIGKILL)
rage --decrypt ~/.ssh/id_system_deployer.age -i ./yubikey-ident.pub | ssh-add -t 1h - 
SSH_ADD_RET=$?
if [ $SSH_ADD_RET -ne 0 ] ; then
    >&2 echo "Failed to load key!"
    exit $SSH_ADD_RET
else 
    echo "Loaded key"
fi
colmena "$@"
COLMENA_RET=$?
# unload the key when done
cleanup
# reset signal interrupts
trap - SIGINT SIGHUP SIGABRT SIGTERM SIGQUIT
exit $COLMENA_RET

in your programs.ssh.extraConfig you use a pubkey instead for ssh-agent:

Match user system_deployer
  IdentityFile ~/.ssh/id_system_deployer.pub
Match all

make sure services.ssh-agent.enable = true; and there aren't any other programs acting as ssh-agents (check with echo $SSH_AUTH_SOCK)