Support non-ephemeral, self-hosted runners

mkrakowitzer commented 4 years ago

subsequent builds fail with:

Run webfactory/ssh-agent@v0.2.00s
##[error]Node run failed with exit code 1
Run webfactory/ssh-agent@v0.2.0
Adding GitHub.com keys to undefined/.ssh/known_hosts
Starting ssh-agent
bind: Address already in use
unix_listener: cannot bind to path: /tmp/ssh-auth.sock
##[error]Command failed: ssh-agent -a /tmp/ssh-auth.sock
bind: Address already in use
unix_listener: cannot bind to path: /tmp/ssh-auth.sock

##[error]Node run failed with exit code 1

This is on a local github-runners, which are not ephemeral. Cleaning up the /tmp/ssh-auth.sock manually resolves this problem temporarily.

mpdude commented 4 years ago

If they are not ephemeral, how can you know what’s still remaining of previous runs? Seems the ssh-agent is still running in the background?

Is that really a feature to have that sort of persistence?

mkrakowitzer commented 4 years ago

sorry I am confused, maybe I chose my words poorly. My understanding of ephemeral is short-lived, so "not ephemeral" is long lived. Local runners are persistent, there is no other way to deploy them.

mpdude commented 4 years ago

Since we don't (currently) have a use case for this ourselves, I cannot promise that we will work on that soon.

But in case anyone wants to take a stab: We'd probably need to perform a "post-run" step like the actions/cache action does here:

https://github.com/actions/cache/blob/e43776276fc1bf0f5f1b462661f341691905b2df/action.yml#L20-L21

In that step, under all circumstances, kill the ssh-agent process. The PID is probably emitted at the the ssh-agent is started, so we'd need to parse it from there, possibly export it to the environment and use it during clean-up to terminate the right process.

Additionally, setting up known_hosts keys should be made idempotent.

kenhuang commented 4 years ago

Yah, facing the same issue host self-host running on mac.

mkrakowitzer commented 4 years ago

I have worked around it for now with

- uses: webfactory/ssh-agent@v0.2.0
       with:
         ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
         ssh-auth-sock: /tmp/ssh-auth.sock.${{ github.run_id }}

It not perfect, hopefully, I can make some to address it properly.

platonicsocrates commented 4 years ago

@mkrakowitzer Yes, this workaround is alright, but will probably end up creating lots of unnecessary ssh-agents.

It not perfect, hopefully, I can make some to address it properly.

Did you ever address it properly? I'm happy to collaborate on this.

mkrakowitzer commented 4 years ago

No, I have not addressed it properly, just workarounds I am afraid. I added the following as a cleanup job

    - name: 'cleanup'
      if: always()
      run: |
         rm -f /tmp/ssh-auth.sock.${{ github.run_id }}.${{ github.run_number }}

However the ssh-agent processes are still running and then they also need to be killed as part of the cleanup job.

 Main PID: 701145 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/actions.runner.aem-dispatcher.aem-dispatcher-runner-01-16768d0.service
           ├─785000 ssh-agent -a /tmp/ssh-auth.sock.72521063
           ├─785591 ssh-agent -a /tmp/ssh-auth.sock.72521063
           ├─787517 ssh-agent -a /tmp/ssh-auth.sock.72543562
           ├─788107 ssh-agent -a /tmp/ssh-auth.sock.72553213
           ├─796939 ssh-agent -a /tmp/ssh-auth.sock.72614220
           ├─797491 ssh-agent -a /tmp/ssh-auth.sock.72618315
           ├─798489 ssh-agent -a /tmp/ssh-auth.sock.72630309
           ├─799057 ssh-agent -a /tmp/ssh-auth.sock.72633548
           ├─801617 ssh-agent -a /tmp/ssh-auth.sock.72633548
           ├─803297 ssh-agent -a /tmp/ssh-auth.sock.72663836
           ├─804815 ssh-agent -a /tmp/ssh-auth.sock.72663836
           ├─805527 ssh-agent -a /tmp/ssh-auth.sock.72681060.16
           ├─808625 ssh-agent -a /tmp/ssh-auth.sock.72713404.18
           ├─808958 ssh-agent -a /tmp/ssh-auth.sock.72715979.19
           ├─809336 ssh-agent -a /tmp/ssh-auth.sock.72717144.20
           ├─809571 ssh-agent -a /tmp/ssh-auth.sock.72718146.21
           ├─810305 ssh-agent -a /tmp/ssh-auth.sock.72718146.21
           ├─810529 ssh-agent -a /tmp/ssh-auth.sock.72718146.21
           ├─811279 ssh-agent -a /tmp/ssh-auth.sock.72723208.22
           ├─812454 ssh-agent -a /tmp/ssh-auth.sock.72730887.23
           ├─814024 ssh-agent -a /tmp/ssh-auth.sock.72742938.24
           ├─815957 ssh-agent -a /tmp/ssh-auth.sock.72758744.25
           ├─817366 ssh-agent -a /tmp/ssh-auth.sock.72775715.26
           └─833851 ssh-agent -a /tmp/ssh-auth.sock.72805698.27

platonicsocrates commented 4 years ago

ok thanks for sharing. did you try implementing mpdude's suggestion (exporting to environment and then cleaning up)?

In that step, under all circumstances, kill the ssh-agent process. The PID is probably emitted at the the ssh-agent is started, so we'd need to parse it from there, possibly export it to the environment and use it during clean-up to terminate the right process.

thommyhh commented 4 years ago

@mpdude I ran into the same problem, made the necessary changes and created a PR. Please take at look.

@mkrakowitzer Until this is merged, you can try to use my fork: https://github.com/webcoast-dk/ssh-agent. You can or should use $SSH_AGENT_PID to kill the SSH agent as the last step. See updated README.md

webfactory / ssh-agent

Support non-ephemeral, self-hosted runners #16