chore(accept_server): optionally print stack trace on signal

kostasrim commented 11 months ago

Allow to optionally print stack traces on received signals

codecov-commenter commented 11 months ago

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (e0987ee) 77.34% compared to head (6e66704) 77.26%.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #156 +/- ## ========================================== - Coverage 77.34% 77.26% -0.08% ========================================== Files 98 98 Lines 7173 7192 +19 ========================================== + Hits 5548 5557 +9 - Misses 1625 1635 +10 ``` | [Files](https://app.codecov.io/gh/romange/helio/pull/156?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman) | Coverage Δ | | |---|---|---| | [util/fibers/accept\_server.cc](https://app.codecov.io/gh/romange/helio/pull/156?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman#diff-dXRpbC9maWJlcnMvYWNjZXB0X3NlcnZlci5jYw==) | `78.40% <40.00%> (-3.81%)` | :arrow_down: | ... and [6 files with indirect coverage changes](https://app.codecov.io/gh/romange/helio/pull/156/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

romange commented 11 months ago

There are may be several non-intrusive options.

Why not send "DEBUG STACKTRACE" before shutting down the process?
There is also ListenerInterface::PreShutdown hook which may help with what you need.

kostasrim commented 11 months ago

Isn't it the case that we want to print the stacktraces only when instance.stop() fails in pytests? There we use SIGKILL which brutally kills the process, so I don't think any of the two above get a chance to run? That's why I wanted to use a signal, such that before we proc.kill() I send a proc.send_signal(signal.SIGINT) to allow DF print the stacktraces before it gets brutally killed. (SIGKILL can't be handled or blocked)

Is there something I am missing?

romange commented 11 months ago

Ok, if you want to print only after the regular shutdown sequence finishes then yes, signal is needed. But you can register for another signal in dragonfly via a proactor. Does not have to be sigint.

Another thing, accessing server logs is the prerequisite for this pr, because I am not sure the system is stable enough at this stage to actually execute the stacktrace printing logic.

On Fri, Oct 13, 2023, 7:09 PM Kostas Kyrimis @.***> wrote:

Isn't it the case that we want to print the stacktraces only when instance.stop() fails in pytests? There we use SIGKILL which brutally kills the process, so I don't think any of the two above get a chance to run? That's why I wanted to a signal, such that before we proc.kill() I send a proc.send_signal(signal.SIGINT) to allow DF print the stacktraces before it gets brutally killed. (SIGKILL can't be handled or blocked)

Is there something I am missing?

— Reply to this email directly, view it on GitHub https://github.com/romange/helio/pull/156#issuecomment-1761763858, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4BFCEDDGIRGDQCTSJJQ3DX7FRT7AVCNFSM6AAAAAA57FYA4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRRG43DGOBVHA . You are receiving this because you commented.Message ID: @.***>

kostasrim commented 11 months ago

But you can register for another signal in dragonfly via a proactor. Does not have to be sigint.

Yep that's true and now I am thinking about this doesn't need to be done in helio.

kostasrim commented 11 months ago

Another thing, accessing server logs is the prerequisite for this pr, because I am not sure the system is stable enough at this stage to actually execute the stacktrace printing logic.

I agree, this might never happen and it's kinda a longshot. But then what's the value of printing stacktraces then? We want to print them when things go bad such that we can use them to figure out what happened (for cases that the python code must brutally kill DF via proc.kill() because it doesn't respond).

Also regular shutdown is impossible, if we reached proc.kill then we don't really have many options. The only room is to issue a signal before we kill and hope that it actually gets processed.

Another thing that would add value here is the core dump...

All in all, given that the system is unstable and that the signal might not actually print the stacktraces, is there really added value by this idea?

What's your opinion?

romange / helio

chore(accept_server): optionally print stack trace on signal #156

Codecov Report