Closed pnuu closed 1 month ago
There are currently two tests. One running generate_messages
in a subprocess, another that does the same with Runner
. With the latter case I get an error with zmq.error.ZMQError: Invalid argument (addr='localhost:40000')
eventhough the connection_parameters
are equal to the former. This causes the exit code to be 1
instead of the expected 0
.
Another problem is, that I'd expect the process to be up for 5 seconds due to the message iteration loop to wait for 5 seconds for new messages before timeout. Instead the process terminates immediately.
If I test with this script
#!/usr/bin/env python
import os
import signal
import time
def main():
""""""
print(os.getpid())
keep_looping = True
tic = None
def _signal_handler(signum, frame):
nonlocal keep_looping
nonlocal tic
print("signal received")
keep_looping = False
tic = time.time()
signal.signal(signal.SIGTERM, _signal_handler)
while keep_looping:
time.sleep(10)
print("terminated with", signal.SIGTERM)
print("time after termination", time.time() - tic)
if __name__ == "__main__":
main()
and run kill
from another terminal for the printed PID I see the termination time to be in relation to the time.sleep()
. I thought these should be pretty equivalent.
Also using
while keep_looping:
try:
_ = in_queue.get(True, 10)
except queue.Empty:
pass
works as expected and the 10 second timeout is respected.
I'm guessing the premature termination is because of the posttroll.listener.ListenerContainer
runs in a separate thread and is killed so the queue also exits.
I'll see if I can adapt the test_sigterm_runner
so that there is a proper "workload" running and that it runs until completion.
With https://github.com/pytroll/trollflow2/pull/217/commits/11e5dff668a3ecf6d66fb12c466d4b94f3fe8156 the processing should be simulated and shown that we are waiting for it to finish before exiting.
I'll add a note about the SIGTERM signaling to documentation.
Attention: Patch coverage is 96.00000%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 95.93%. Comparing base (
6b6e6b1
) to head (4aebf81
). Report is 16 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
trollflow2/launcher.py | 77.77% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
It is interesting that Codecov complains that _signal_handler
is not covered, running all the tests locally with pytest -s
prints out the signaling log message. I did try with caplog
to catch those messages (the fixture seems to be still there in test_sigterm_runner()
...), but I guess it didn't work because of the logs being in a separate thread/process and the messages are not propageted in my test case. I'll have another go.
Hmm, no, I can't capture the log messages even if I get them to print on the terminal. I also tried wrapping the Process()
until proc.join()
with trollflow2.logging.logging_on
decorator but still only see the message on terminal and caplog.text
stays empty.
And this is what is printed to terminal when using with logging_on()
context:
[DEBUG: 2024-10-21 15:48:53 : trollflow2.launcher] Launching trollflow2 with subprocesses
[INFO: 2024-10-21 15:48:53 : trollflow2.launcher] New message received.
[DEBUG: 2024-10-21 15:48:53 : trollflow2.launcher] pytroll://my/topic file lahtinep@lahtinep21lk 2024-10-21T12:48:53.117334 v1.01 application/json {"filename": "foo"}
[INFO: 2024-10-21 15:48:54 : trollflow2.launcher] Caught signal to stop processing new messages and to terminate Trollflow2.
[INFO: 2024-10-21 15:48:59 : trollflow2.launcher] All 0 files produced nominally in 0:00:06.460422
And this is what is printed to terminal when using
with logging_on()
context:[DEBUG: 2024-10-21 15:48:53 : trollflow2.launcher] Launching trollflow2 with subprocesses [INFO: 2024-10-21 15:48:53 : trollflow2.launcher] New message received. [DEBUG: 2024-10-21 15:48:53 : trollflow2.launcher] pytroll://my/topic file lahtinep@lahtinep21lk 2024-10-21T12:48:53.117334 v1.01 application/json {"filename": "foo"} [INFO: 2024-10-21 15:48:54 : trollflow2.launcher] Caught signal to stop processing new messages and to terminate Trollflow2. [INFO: 2024-10-21 15:48:59 : trollflow2.launcher] All 0 files produced nominally in 0:00:06.460422
yes, caplog and the queue logger sometimes ignore each other. Question is in which thread/process does the signal handling run? That maybe influences where the log goes?
Question is in which thread/process does the signal handling run? That maybe influences where the log goes?
In test_sigterm_generate_messages()
the signal handling is in the main thread, there's only the one process.
It gets more complicated in test_sigterm_runner()
. generate_messages()
is called in launcher.Runner._get_message_iterator()
and the iterator is passed on to _run_threaded()
or _run_subprocess()
. The queue_logged_process
is given a message from the message receiving iterator. The message receiving part should thus be within the main process of Runner
that is started in the test case.
Looks like the messages do not end up in the listener.output_queue
. If I revert https://github.com/pytroll/trollflow2/pull/217/commits/aefd2a33fd131f24065dcbafd123d968680c2406 the _fake_queue_logged_process
is called again.
Looks like the messages do not end up in the
listener.output_queue
. If I revert aefd2a3 the_fake_queue_logged_process
is called again.
ok, then I suppose the patched subscriber doesn't work with the listenercontainer. That is another issue that we need to fix in another PR, so feel free to put the mocking back.
Ok, I reverted from patched_subscriber_recv
to mocking ListenerContainer
. I also shortened the output queue wait time from 5 to 1 second to make testing a bit more streamlined. And removed unused caplog
from test_sigterm_runner()
.
This PR adds handling for SIGTERM signal, so that Trollflow2 can be terminated gracefully in for example container environment.
flake8 trollflow2
AUTHORS.md
if not there already