shell-pool / shpool

Think tmux, then aim... lower
Apache License 2.0
1.15k stars 20 forks source link

shpool process hanging on ssh credentials timeout #155

Open apeyser opened 3 days ago

apeyser commented 3 days ago

Both with mosh & ssh + shpool, I've been finding shpool process hanging. If the shpool process is directly killed, then detach + list work fine, but if I try an shpool list with the process live, the daemon hangs and a systemctl --user restart shpool is need -- aka, everything getting killed.

It appears connected with ssh credentials timing out; refreshing the credentials doesn't seem to solve the problem.

What I find in the logs is:

2024-09-25T08:27:57.783362Z ERROR ThreadId(274) handling new connection: writing version header
Caused by:
0: serializing data
1: invalid value write: error while writing multi-byte MessagePack value
2: error while writing multi-byte MessagePack value
3: Broken pipe (os error 32)

shpool version 0.7.0

ethanpailes commented 3 days ago

That error message isn't neccicarily a sign of something going wrong, since the shpool attach process will probe the control socket to see if someone is listening in order to decide if it needs to autodaemonize. It just hangs up immediately while the daemon tries to initiate the handshake, causing this error to get generated in the daemon logs, but it doesn't actually indicate something is going wrong.

Can you post some step-by-step instructions for how to reproduce the issue? I've had ssh credential timeouts without seeing issues with shpool, so I'm not quite sure how to try to reproduce this.

apeyser commented 3 days ago

pkill ssh-agent is enough to trigger it for me (once) -- but not necessary, since the usual condition doesn't involve restarting the ssh-agent, but simply allowing the creds to go stale and/or (unclear) letting the ssh control master time out. But it seems to be extremely flaky -- reproducing the failure is hard.

ethanpailes commented 2 days ago

shpool doesn't really know anything about ssh, so the problem probably isn't directly related to ssh. There is probably some way to reproduce the issue purely with shpool commands, though it might be hard to find.