mxschmitt / action-tmate

Debug your GitHub Actions via SSH by using tmate to get access to the runner system itself.
https://mxschmitt.github.io/action-tmate/
MIT License
2.86k stars 285 forks source link

feat: establish CircleCI like tmate debugging session #156

Closed galargh closed 1 year ago

galargh commented 1 year ago

This PR adds new inputs with which the action can be used to imitate CircleCI's SSH debugging sessions. It also adds prettier config to the repo but doesn't apply it yet to make the diff easier to review.

CircleCI like tmate session

If you want to set up a tmate session similarily to how CircleCI does it, you can use the following configuration.

You can insert it at the beggining of your pipeline. It will remain dormant until you rerun all jobs with the debug logging enabled.

Then, it will print out the tmate connection string when a session is established but it will not wait for you to connect. Instead, it will proceed with the run and it will wait for 10 minutes after it is complete. If you do not connect within that time, the session will remain open until you disconnect. The disconnection checks will be performed every 10 minutes.

You will be able to connect to the session only if you use registered public SSH key(s).

name: CI
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - name: Setup tmate session
      uses: mxschmitt/action-tmate@v3
      if: runner.debug == '1'
      with:
        check-num-clients: true
        limit-access-to-actor: true # requires registered public SSH key(s)
        wait: false
        wait-in-post: true
        wait-interval: '600000' # 10 minutes
    - uses: actions/checkout@v2

Testing

galargh commented 1 year ago

Happy to split refactor/feature into separate PRs but before continuing to work on this contribution, I'd like to learn if the feature is something the maintainers see as valuable to this project.

dscho commented 1 year ago

I'd like to learn if the feature is something the maintainers see as valuable to this project.

I use a similar feature in CirrusCI whenever debugging Git problems on FreeBSD.

So I think with some careful design, this can be a highly valuable feature. One thing I would love to see improved is the user interface: I would like there to be one new input with a highly descriptive name (and yes, I agree, naming is hard).

Another thing I would love to see is a change in design when it comes to the post-action: instead of waiting for a fixed time-out and then exiting, it would be much better to wait for the tmate session to be ended by the user (e.g. via Ctrl+D in the connected shell).

Speaking of the post-action: we need to define better how we intend to handle job states other than success: the job could have failed, and it could have been canceled.

If you really like the CircleCI feature where the terminal times out and force-disconnects, that is a feature that could be implemented independently of the feature to send the session into the background.

galargh commented 1 year ago

Another thing I would love to see is a change in design when it comes to the post-action: instead of waiting for a fixed time-out and then exiting, it would be much better to wait for the tmate session to be ended by the user (e.g. via Ctrl+D in the connected shell).

This assumption is not correct. See: https://github.com/mxschmitt/action-tmate/blob/880f83b4efe525840debc4ee2b65dbab16d5e53a/src/helpers.js#L115-L144 The session doesn't quit if there are any connected clients. This behaviour is in line with how the action works now.

dscho commented 1 year ago

The session doesn't quit if there are any connected clients.

Indeed. And that might be because the run finished so fast that the user did not even have a chance to connect yet.

Don't get me wrong: I can see a potential point in timing out when users don't even bother to connect. But I see an even bigger point in being nice to users who did connect and do not want their session getting terminated while they're in the middle of debugging something.

galargh commented 1 year ago

Don't get me wrong: I can see a potential point in timing out when users don't even bother to connect. But I see an even bigger point in being nice to users who did connect and do not want their session getting terminated while they're in the middle of debugging something.

That's exactly what this option is for - https://github.com/mxschmitt/action-tmate/blob/880f83b4efe525840debc4ee2b65dbab16d5e53a/action.yml#L39-L40

dscho commented 1 year ago

Don't get me wrong: I can see a potential point in timing out when users don't even bother to connect. But I see an even bigger point in being nice to users who did connect and do not want their session getting terminated while they're in the middle of debugging something.

That's exactly what this option is for -

https://github.com/mxschmitt/action-tmate/blob/880f83b4efe525840debc4ee2b65dbab16d5e53a/action.yml#L39-L40

I understand that. But the design of this PR makes the Action complicated to use: too many configuration options, too unintuitive defaults.

Ideally, what would happen is that there was an automated time-out waiting for the user to connect where the Action would stop with failure when that time-out is reached. The time-out would be automatically canceled as soon as the user connects. This would be the default, no need to specify any configuration. The time-out would have a sensible default, again, no need to specify anything.

galargh commented 1 year ago

Ideally, what would happen is that there was an automated time-out waiting for the user to connect where the Action would stop with failure when that time-out is reached. The time-out would be automatically canceled as soon as the user connects. This would be the default, no need to specify any configuration. The time-out would have a sensible default, again, no need to specify anything.

I certainly wouldn't want the tmate action to fail my workflow on connection time-out 😨

Either way, following the discussion, I feel that the disconnect between the original proposal and the suggested changes is pretty substantial. It would be quite an undertaking to reach an "approvable" state which, sadly, I cannot afford at the moment. Hence, I'm going to close the PR for now.

Thank you for your engagement 🙇

dscho commented 1 year ago

Ideally, what would happen is that there was an automated time-out waiting for the user to connect where the Action would stop with failure when that time-out is reached. The time-out would be automatically canceled as soon as the user connects. This would be the default, no need to specify any configuration. The time-out would have a sensible default, again, no need to specify anything.

I certainly wouldn't want the tmate action to fail my workflow on connection time-out 😨

@galargh You would want the action-tmate "post" action to block up to six hours after the workflow completes, waiting for a user who does not connect?

galargh commented 1 year ago

Ideally, what would happen is that there was an automated time-out waiting for the user to connect where the Action would stop with failure when that time-out is reached. The time-out would be automatically canceled as soon as the user connects. This would be the default, no need to specify any configuration. The time-out would have a sensible default, again, no need to specify anything.

I certainly wouldn't want the tmate action to fail my workflow on connection time-out 😨

@galargh You would want the action-tmate "post" action to block up to six hours after the workflow completes, waiting for a user who does not connect?

In most cases, I would want the action to exit gracefully when it reaches the configured timeout without failing my workflow.

dscho commented 1 year ago

Oh right, that would be nicer.

dscho commented 1 year ago

Gotta say, I am finding myself liking this "detached" mode. I implemented it "manually" yesterday, and it came in pretty handy.

dscho commented 1 year ago

FWIW I offer my proposed solution over here: https://github.com/mxschmitt/action-tmate/pull/162