tezos-reward-distributor-organization / tezos-reward-distributor

Tezos Reward Distributor (TRD): A reward distribution software for tezos bakers.
GNU General Public License v3.0
87 stars 51 forks source link

Feat/better exit messages #657

Closed rvermootenct closed 1 year ago

rvermootenct commented 1 year ago

name: Pull Request about: Create a pull request to make a contribution labels:

IMPORTANT NOTICE: I read and understood the guidelines for contributions to the TRD. The contribution may qualify for being compensated by the TRD grant if approved by the maintainers.

This PR resolves the issue #653 . The following steps were performed:

Work effort:

jdsika commented 1 year ago

@nicolasochem does that solve your issue?

jdsika commented 1 year ago

@rvermootenct work effort missing. I propose to merge today and make further improvements in a separate PR

nicolasochem commented 1 year ago

@rvermootenct @jdsika thanks!

It does sound useful to have a utility, and I think having 3 possible exit codes is fine.

After a cursory look, it looks like you are only catching USER_ABORT here? And never GENERAL_ERROR?

Can you also add the case where the program ends in error:

These are the scenarios for which I would like my infra to alert me.

jdsika commented 1 year ago

@rvermootenct I would like to include this in a release that I want to start to prepare. I think we need to put a date on this one

jdsika commented 1 year ago

When will you finish this PR? We will soon have a new protocol version

nicolasochem commented 1 year ago

It looks like the last commit adds multiple error types, but it still does not catch the generic error as I was calling for, is this correct @rvermootenct ?

jdsika commented 1 year ago

make sure to integrate #611 here "clear lockfile when properly shut down"

"Properly" means in this case that there are no threads running in the background anymore which could potentially trigger a payment.

rvermootenct commented 1 year ago

make sure to integrate #611 here "clear lockfile when properly shut down"

"Properly" means in this case that there are no threads running in the background anymore which could potentially trigger a payment.

This PR only gives better exit messages and exit codes. The state machine in this thing is very confusing to me and I've tried to make sense of it but I think someone else will be better equipped to figure out how to ensure the lockfiles are correctly removed at the correct times. As I'm moving away from this project/ecosystem I don't think it's worth anyones while me spending time struggling through this. I'm willing to spend some time today to try figure this out, but if I can't crack it I'd like to not add to this PR.

I'd like this work to be thoroughly checked too because I don't want this pr to incorrectly exit program.

@nicolasochem I hope now I have caught the generic errors. If not can we please have a chat sometime this week and you can explain to me the situation?

nicolasochem commented 1 year ago

I tried the code while my signer was not running and got nested exceptions. TypeError: exit_program() missing 1 required positional argument: 'exit_message'. It does look like one parameter is missing to exit_program in some cases.

2023-03-27 23:04:22,725 - MainThread - INFO - --------------------------------------------------------
2023-03-27 23:04:22,725 - MainThread - INFO - Sensitive operations are in progress!
2023-03-27 23:04:22,725 - MainThread - INFO - Please wait while the application is being shut down!
2023-03-27 23:04:22,725 - MainThread - INFO - --------------------------------------------------------
Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/lib64/python3.11/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1037, in _send_output
  File "/usr/lib64/python3.11/http/client.py", line 975, in send
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fcf28824590>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='', port=6732): Max retries exceeded with url: /keys/tz1ejA7UWkdVk9wYkLGnReq2qrmyi5Po86FK (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcf28824590>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/src/cli/client_manager.py", line 210, in _do_request
    response = requests.request(
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/requests/adapters.py", line 565, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='', port=6732): Max retries exceeded with url: /keys/tz1ejA7UWkdVk9wYkLGnReq2qrmyi5Po86FK (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcf28824590>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/src/cli/client_manager.py", line 135, in check_pkh_known_by_signer
    response = self._do_request(method="GET", url=url, timeout=timeout)
  File "/home/nochem/workspace/tezos-reward-distributor/src/cli/client_manager.py", line 224, in _do_request
    exit_program(ExitCode.SIGNER_ERROR, e)
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/exit_program.py", line 25, in exit_program
    if exit_message(exit_code):
TypeError: 'ConnectionError' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 216, in start
  File "/home/nochem/workspace/tezos-reward-distributor/src/fsm/TransitionsFsmModel.py", line 26, in trigger_event
    self.trigger(event, *args, **kwargs)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 922, in _get_trigger
    return event.trigger(model, *args, **kwargs)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 402, in trigger
    return self.machine._process(func)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1211, in _process
    return trigger()
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 416, in _trigger
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 439, in _process
    if trans.execute(event_data):
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 277, in execute
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 287, in _change_state
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 129, in enter
    event_data.machine.callbacks(self.on_enter, event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1146, in callbacks
    self.callback(func, event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1165, in callback
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 294, in do_load_config
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/config_life_cycle.py", line 86, in start
  File "/home/nochem/workspace/tezos-reward-distributor/src/fsm/TransitionsFsmModel.py", line 26, in trigger_event
    self.trigger(event, *args, **kwargs)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 922, in _get_trigger
    return event.trigger(model, *args, **kwargs)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 402, in trigger
    return self.machine._process(func)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1211, in _process
    return trigger()
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 416, in _trigger
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 439, in _process
    if trans.execute(event_data):
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 277, in execute
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 287, in _change_state
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 129, in enter
    event_data.machine.callbacks(self.on_enter, event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1146, in callbacks
    self.callback(func, event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1165, in callback
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/config_life_cycle.py", line 126, in do_validate_cfg
  File "/home/nochem/workspace/tezos-reward-distributor/src/config/yaml_baking_conf_parser.py", line 74, in validate
  File "/home/nochem/workspace/tezos-reward-distributor/src/config/yaml_baking_conf_parser.py", line 238, in validate_payment_address
  File "/home/nochem/workspace/tezos-reward-distributor/src/cli/client_manager.py", line 137, in check_pkh_known_by_signer
    exit_program(ExitCode.SIGNER_ERROR, f"{e}\n{signer_exception}")
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/exit_program.py", line 25, in exit_program
    if exit_message(exit_code):
TypeError: 'str' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/src/main.py", line 146, in <module>
  File "/home/nochem/workspace/tezos-reward-distributor/src/main.py", line 127, in start_application
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 245, in start
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 386, in shut_down_on_error
TypeError: exit_program() missing 1 required positional argument: 'exit_message'
nicolasochem commented 1 year ago

It's still not working. When signer is off, I'm still getting

TypeError: 'str' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/src/main.py", line 146, in <module>
  File "/home/nochem/workspace/tezos-reward-distributor/src/main.py", line 127, in start_application
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 245, in start
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 386, in shut_down_on_error
TypeError: exit_program() missing 1 required positional argument: 'exit_message'

An error message is needed here: https://github.com/tezos-reward-distributor-organization/tezos-reward-distributor/pull/657/files#diff-acda51ab96991dc20d4760e24db3206e89b406b0ee525a93ef6ff7ed290dbd44R386

nicolasochem commented 1 year ago

Another error I got:

2023-03-29 04:46:05,602 - consumer0 - DEBUG - Consumer returning...
2023-03-29 04:46:05,602 - producer  - DEBUG - Unknown error in payment producer loop: 'str' object is not callable
Traceback (most recent call last):
  File "/app/src/pay/payment_producer.py", line 337, in run
  File "/app/src/pay/payment_producer.py", line 152, in exit
  File "/app/src/util/exit_program.py", line 25, in exit_program
    if exit_message(exit_code):
TypeError: 'str' object is not callable
2023-03-29 04:46:05,603 - producer  - ERROR - Unknown error in payment producer loop: 'str' object is not callable, will try again.
2023-03-29 04:46:05,603 - producer  - DEBUG - Producer returning...
jdsika commented 1 year ago

@vkresch I see also a topic with the logger. I think the global logger should trigger the exit of the function when an error log is thrown right?

vkresch commented 1 year ago

@jdsika please post pone this feature as I would like to have more time to look into it. Currently the implementation seems not to fix the initial issue.

jdsika commented 1 year ago

I would call it "temporarily disabled until time for a fix" but if you want to call the "removal of the feature" :D

jdsika commented 1 year ago

@vkresch I see also a topic with the logger. I think the global logger should trigger the exit of the function when an error log is thrown right?

IMO and error in the log must trigger a graceful shutdown - yes!

vkresch commented 1 year ago

Work effort: 4h

vkresch commented 1 year ago

@nicolasochem @jdsika could you test your usecase again?

nicolasochem commented 1 year ago

@vkresch it works now, I tried 3 things:

So it looks like you fixed the issue :+1:

vkresch commented 1 year ago

@nicolasochem gonna try to fix the tests today here and then we can merge

vkresch commented 1 year ago

@jdsika @nicolasochem mergable if needed