Closed nicolasochem closed 5 months ago
Hmm there might still be an error in the logic, we tested this with a payout address that had insufficient funds (so we'd expect it to fail), and it did fail and logged that the exit code should be 6 (INSUFFICIENT_FUNDS = 6
), but the overall exit code was SUCCESS = 0
. Maybe that 6
just isn't bubbling up properly?
2023-08-23 20:11:49,438 - consumer0 - INFO - Total estimated amount to pay out is 2,730,732,539 mutez.
2023-08-23 20:11:49,439 - consumer0 - INFO - 2580 payments will be done in 11 batches
2023-08-23 20:11:49,439 - consumer0 - INFO - Current balance in payout address is 0 mutez.
2023-08-23 20:11:49,440 - consumer0 - ERROR - Payment attempt failed because of insufficient funds in the payout address. The current balance of 0 mutez is insufficient to pay for cycle rewards of 2,730,732,539 mutez.
2023-08-23 20:11:49,440 - consumer0 - INFO - [Plugins] Not sending notification; no plugins enabled
2023-08-23 20:11:49,442 - consumer0 - INFO - Processing completed for 2585 payment items, 2580 failed.
2023-08-23 20:11:49,444 - consumer0 - INFO - Payment report is created at '/home/tezos/pymnt/simulations/.../payments/done/643.csv'
2023-08-23 20:11:49,452 - consumer0 - INFO - Payment report is created at '/home/tezos/pymnt/simulations/.../payments/failed/643.csv'
2023-08-23 20:11:49,611 - consumer0 - INFO - Anonymous statistics disabled, (Dry run)
2023-08-23 20:11:49,611 - consumer0 - INFO - Unknown Error at payment consumer. Please consult the verbose logs! Exit code: 6
2023-08-23 20:11:53,046 - producer - INFO - Reward creation is done for cycle 643, created 3328 rewards.
2023-08-23 20:11:53,047 - producer - INFO - Run mode ONETIME satisfied. Terminating...
2023-08-23 20:11:53,047 - producer - INFO - TRD Exit triggered by producer Exit code: 0
2023-08-23 20:11:53,047 - MainThread - INFO - Application stop handler called by producer: 10
2023-08-23 20:11:53,048 - MainThread - INFO - TRD is shutting down...
2023-08-23 20:11:53,048 - MainThread - INFO - --------------------------------------------------------
2023-08-23 20:11:53,048 - MainThread - INFO - Sensitive operations are in progress!
2023-08-23 20:11:53,048 - MainThread - INFO - Please wait while the application is being shut down!
2023-08-23 20:11:53,048 - MainThread - INFO - --------------------------------------------------------
2023-08-23 20:11:53,048 - MainThread - INFO - Shutdown. Exit code: 0
@jmo-staked good catch! I just pushed a fix, can you please try again?
That's looking better, the consumer
fails with a 6
, the producer
exits with a 0
, and the app itself exits with a 1
. But all that is really critical is the Zero-or-Notzero nature of shell exit codes so this patch looks like it'll work fine for us, thanks again!
2023-08-24 00:48:25,910 - consumer0 - INFO - Anonymous statistics disabled, (Dry run)
2023-08-24 00:48:25,910 - consumer0 - INFO - Unknown Error at payment consumer. Please consult the verbose logs! Exit code: 6
2023-08-24 00:48:29,458 - producer - INFO - Reward creation is done for cycle 643, created 3328 rewards.
2023-08-24 00:48:29,459 - producer - INFO - Run mode ONETIME satisfied. Terminating...
2023-08-24 00:48:29,459 - MainThread - INFO - Application stop handler called: 12
2023-08-24 00:48:29,459 - producer - INFO - TRD Exit triggered by producer Exit code: 0
2023-08-24 00:48:29,459 - MainThread - INFO - TRD is shutting down...
2023-08-24 00:48:29,460 - MainThread - INFO - --------------------------------------------------------
2023-08-24 00:48:29,460 - MainThread - INFO - Sensitive operations are in progress!
2023-08-24 00:48:29,460 - MainThread - INFO - Please wait while the application is being shut down!
2023-08-24 00:48:29,460 - MainThread - INFO - --------------------------------------------------------
2023-08-24 00:48:29,460 - MainThread - INFO - Shutdown due to error! Exit code: 1
$ kubectl describe po ...
State: Terminated
Reason: Error
Exit Code: 1
yes I'm using two signals, SIGUSR1 indicates success and SIGUSR2 is failure... it's fine.
Does it return 0 when payout succeeds?
We won't have a payout to test that against until cycle 644 completes
Two more tests passed swimmingly, both with exit code 0
:
0
with no payout since it had already been completedMight have hit one minor other exit error, in this case the MainThread
exit code was logged incorrectly:
2023-08-29 16:00:53,401 - MainThread - INFO - [Plugins] No plugins enabled
2023-08-29 16:00:53,402 - MainThread - INFO - Initial cycle set to -1
2023-08-29 16:00:53,411 - MainThread - INFO - Application is READY!
2023-08-29 16:00:53,411 - MainThread - INFO - --------------------------------------------
2023-08-29 16:00:54,489 - producer - ERROR - Unable to fetch current cycle from provider tzkt, Not synced. Exiting.
2023-08-29 16:00:54,490 - consumer0 - WARNING - Exit signal received. Terminating...
2023-08-29 16:00:54,490 - producer - INFO - TRD Exit triggered by producer Exit code: 8
2023-08-29 16:00:54,490 - MainThread - INFO - Application stop handler called by producer: 10
2023-08-29 16:00:54,491 - MainThread - INFO - TRD is shutting down...
2023-08-29 16:00:54,492 - MainThread - INFO - --------------------------------------------------------
2023-08-29 16:00:54,492 - MainThread - INFO - Sensitive operations are in progress!
2023-08-29 16:00:54,492 - MainThread - INFO - Please wait while the application is being shut down!
2023-08-29 16:00:54,492 - MainThread - INFO - --------------------------------------------------------
2023-08-29 16:00:54,492 - MainThread - INFO - Lock file removed!
2023-08-29 16:00:54,493 - MainThread - INFO - Shutdown. Exit code: 0
$ kubectl describe po ...
State: Terminated
Reason: Error
Exit Code: 1
MainThread
reports it's exiting with code 0
, even though the app is exiting due to error, but the actual exit signal k8s sees is 1
which is good, so the job correctly restarted.
Might have hit one minor other exit error, in this case the
MainThread
exit code was logged incorrectly:2023-08-29 16:00:53,401 - MainThread - INFO - [Plugins] No plugins enabled 2023-08-29 16:00:53,402 - MainThread - INFO - Initial cycle set to -1 2023-08-29 16:00:53,411 - MainThread - INFO - Application is READY! 2023-08-29 16:00:53,411 - MainThread - INFO - -------------------------------------------- 2023-08-29 16:00:54,489 - producer - ERROR - Unable to fetch current cycle from provider tzkt, Not synced. Exiting. 2023-08-29 16:00:54,490 - consumer0 - WARNING - Exit signal received. Terminating... 2023-08-29 16:00:54,490 - producer - INFO - TRD Exit triggered by producer Exit code: 8 2023-08-29 16:00:54,490 - MainThread - INFO - Application stop handler called by producer: 10 2023-08-29 16:00:54,491 - MainThread - INFO - TRD is shutting down... 2023-08-29 16:00:54,492 - MainThread - INFO - -------------------------------------------------------- 2023-08-29 16:00:54,492 - MainThread - INFO - Sensitive operations are in progress! 2023-08-29 16:00:54,492 - MainThread - INFO - Please wait while the application is being shut down! 2023-08-29 16:00:54,492 - MainThread - INFO - -------------------------------------------------------- 2023-08-29 16:00:54,492 - MainThread - INFO - Lock file removed! 2023-08-29 16:00:54,493 - MainThread - INFO - Shutdown. Exit code: 0
$ kubectl describe po ... State: Terminated Reason: Error Exit Code: 1
MainThread
reports it's exiting with code0
, even though the app is exiting due to error, but the actual exit signal k8s sees is1
which is good, so the job correctly restarted.
I don't think this is true. I think that actually, in case of tzkt unresponsive, the app exited succesfully, which is not what should happen. I have now modified the code to propagate producer errors (unresponsive tzkt, disk full etc)... to the main thread. @jmo-staked @jdsika could you please check urgently?
@jdsika pointed out that these signals are not available on windows, more work is needed
We added code in #653 to have TRD send a non-zero exit status when anything wrong happens - including payout failures, or (relevant here) keyboard interrupts.
However, it turns out that when using TRD in runmode -2, -3, -4, the regular course of operations is to have the producer thread send a KeyboardInterrupt when done... which is then treated as error!
This is done with
_thread.interrupt_main()
. It's bad: humans send interrupts, programs should not. Instead, I'm suggesting that the producer useSIGUSR1
to trigger a normal interruption.This triggers the same code path as today, just with an exit status of SUCCESS. So I expect that the main thread will still wait for the consumer, and it won't actually exit before payouts are done, if there are any.
This allows kubernetes bakers (for example) to use a k8s cronjob and get alerted automatically in case of payment failures.
For the record, I still believe this entire producer/consumer logic should be thrown away and we should simply do things in order (calculate, then pay). I've ranted about this before (#491).
Work effort: 3 hours