tezos-reward-distributor-organization / tezos-reward-distributor

Tezos Reward Distributor (TRD): A reward distribution software for tezos bakers.
https://tezos-reward-distributor-organization.github.io/tezos-reward-distributor/
GNU General Public License v3.0
87 stars 51 forks source link

TRD exiting with status code 1, even when it succeeds #678

Closed jmo-staked closed 5 months ago

jmo-staked commented 1 year ago

Describe the bug New exit code handling added in this pr appear to make TRD always exit with 1, ie: failure.

Environment

Scenario 1: payments have already been made

023-08-01 16:10:24,456 - MainThread - INFO - [Plugins] No plugins enabled
2023-08-01 16:10:24,457 - MainThread - INFO - Initial cycle set to -1
2023-08-01 16:10:24,459 - MainThread - INFO - Application is READY!
2023-08-01 16:10:24,459 - MainThread - INFO - --------------------------------------------
2023-08-01 16:10:38,594 - producer  - INFO - Checking for pending payments: checking 635 <= 636 - 1 - 0
2023-08-01 16:10:38,594 - producer  - INFO - Payment cycle is 635
2023-08-01 16:10:38,594 - producer  - WARNING - Payment report for cycle 635 is present. No payment will be run for the cycle. Check '/home/tezos/pymnt/reports/.../payments/done/635.csv'
2023-08-01 16:10:38,595 - producer  - INFO - Run mode ONETIME satisfied. Terminating...
2023-08-01 16:10:38,595 - producer  - INFO - Sending KeyboardInterrupt signal.
2023-08-01 16:10:38,595 - consumer0 - WARNING - Exit signal received. Terminating...
2023-08-01 16:10:38,595 - producer  - INFO - Error at payment producer. Please consult the verbose logs! Exit code: 0
2023-08-01 16:10:44,476 - MainThread - INFO - Interrupted.
2023-08-01 16:10:44,477 - MainThread - INFO - TRD is shutting down...
2023-08-01 16:10:44,477 - MainThread - INFO - --------------------------------------------------------
2023-08-01 16:10:44,477 - MainThread - INFO - Sensitive operations are in progress!
2023-08-01 16:10:44,477 - MainThread - INFO - Please wait while the application is being shut down!
2023-08-01 16:10:44,477 - MainThread - INFO - --------------------------------------------------------
2023-08-01 16:10:44,478 - MainThread - INFO - Lock file removed!
2023-08-01 16:10:44,478 - MainThread - INFO - Shutdown due to error! Exit code: 1

notice the producer thread has the exit code 0 (success), and no errors are logged in either the primary or verbose logs. But it appears this bit of code always sends a KeyboardInterrupt signal, and that signal always causes the app to exit with an error even when no error is present:

https://github.com/tezos-reward-distributor-organization/tezos-reward-distributor/blob/277cbb2c3f49e9b6d9d7951c97a138793279e613/src/pay/payment_producer.py#L141-L157

https://github.com/tezos-reward-distributor-organization/tezos-reward-distributor/blob/dd3378b40833b6a7bf66e926fea45cb714f88cec/src/util/process_life_cycle.py#L229-L231

Scenario 2: Payments haven't yet been made

2023-08-01 16:01:03,244 - consumer0 - INFO - Payment of batch 1 started
2023-08-01 16:01:06,709 - consumer0 - INFO - minimal required fee is 108645, current used fee is 163900
2023-08-01 16:01:06,971 - producer  - INFO - Reward creation is done for cycle 635, created 3269 rewards.
2023-08-01 16:01:06,971 - producer  - INFO - Run mode ONETIME satisfied. Terminating...
2023-08-01 16:01:06,971 - producer  - INFO - Sending KeyboardInterrupt signal.
2023-08-01 16:01:06,971 - producer  - INFO - Error at payment producer. Please consult the verbose logs! Exit code: 0
2023-08-01 16:01:09,825 - consumer0 - INFO - Operation hash is ...
2023-08-01 16:01:09,826 - consumer0 - INFO - Waiting for operation ... to be included... Please do not interrupt the process! (Timeout is around 150 minutes)
2023-08-01 16:01:10,203 - MainThread - INFO - Interrupted.
2023-08-01 16:01:10,203 - MainThread - INFO - TRD is shutting down...
2023-08-01 16:01:10,203 - MainThread - INFO - --------------------------------------------------------
2023-08-01 16:01:10,203 - MainThread - INFO - Sensitive operations are in progress!
2023-08-01 16:01:10,203 - MainThread - INFO - Please wait while the application is being shut down!
2023-08-01 16:01:10,204 - MainThread - INFO - --------------------------------------------------------
2023-08-01 16:01:10,204 - MainThread - INFO - Lock file removed!
2023-08-01 16:01:10,204 - MainThread - INFO - Shutdown due to error! Exit code: 1
...
2023-08-01 16:05:59,367 - consumer0 - INFO - Payment of batch 11 succeeded in 1 attempt(s)
2023-08-01 16:05:59,369 - consumer0 - INFO - Processing completed for 2538 payment items.
2023-08-01 16:05:59,386 - consumer0 - INFO - Payment report is created at '/home/tezos/pymnt/reports/.../payments/done/635.csv'
2023-08-01 16:06:00,065 - consumer0 - INFO - Calculation report is created at '/home/tezos/pymnt/reports/.../calculations/635.csv'
2023-08-01 16:06:00,065 - consumer0 - INFO - Simulated transaction_fees added to calculations file.
2023-08-01 16:06:00,068 - consumer0 - INFO - [Plugins] Not sending notification; no plugins enabled
2023-08-01 16:06:00,068 - consumer0 - INFO - [Plugins] Not sending notification; no plugins enabled
2023-08-01 16:06:00,071 - consumer0 - INFO - Sending anonymous statistics; See docs/statistics.rst for more information.
2023-08-01 16:06:01,438 - consumer0 - WARNING - Exit signal received. Terminating...

Again we see the producer thread exits with a 0 (success) signal, but TRD still says "error". No errors appear in the regular or verbose logs. All payments go out as expected. TRD exits with a 1 (failure), so k8s tries to restart it to run again which is not necessary since it actually succeeded.

2023-08-01 16:01:06,971 - producer - INFO - Error at payment producer. Please consult the verbose logs! Exit code: 0

This log line specifically is super confusing: it's an INFO level log, but says there is an Error, and the exit code is 0 (success)

jmo-staked commented 11 months ago

Haven't found a way around this yet, I'm reluctant to just force the exit code to be 0 since then we'd start missing any real errors...

2023-08-20 16:03:29,577 - producer  - INFO - Sending KeyboardInterrupt signal.
2023-08-20 16:03:29,577 - consumer0 - WARNING - Exit signal received. Terminating...
2023-08-20 16:03:29,577 - producer  - INFO - Error at payment producer. Please consult the verbose logs! Exit code: 0
2023-08-20 16:03:31,357 - MainThread - INFO - Interrupted.
2023-08-20 16:03:31,357 - MainThread - INFO - TRD is shutting down...
2023-08-20 16:03:31,357 - MainThread - INFO - --------------------------------------------------------
2023-08-20 16:03:31,358 - MainThread - INFO - Sensitive operations are in progress!
2023-08-20 16:03:31,358 - MainThread - INFO - Please wait while the application is being shut down!
2023-08-20 16:03:31,358 - MainThread - INFO - --------------------------------------------------------
2023-08-20 16:03:31,358 - MainThread - INFO - Lock file removed!
2023-08-20 16:03:31,359 - MainThread - INFO - Shutdown due to error! Exit code: 1

Would using this option possibly get me past that KeyboardInterrupt signal firing from within the code?

-s --background_service
Marker to indicate that TRD is running in daemon mode. When not given it indicates that TRD is in interactive mode.
nicolasochem commented 11 months ago

hey @jmo-staked thanks for opening the ticket and looking into this! I believe #679 fixes it while still catching actual errors, it would be super helpful if you could try the branch before we merge & release.

In kubernetes you can use the docker tag below to test this PR: https://github.com/tezos-reward-distributor-organization/tezos-reward-distributor/pkgs/container/tezos-reward-distributor/121042222?tag=graceful_exit_onetime