penumbra-zone / tower-abci

Tower-based ABCI interface.
MIT License
73 stars 24 forks source link

tower-abci: handle errors more gracefully #29

Open erwanor opened 1 year ago

erwanor commented 1 year ago

For each individual connection, we spawn a tokio task that is responsible for driving the state and handling I/O. In this context, a variety of failures can occur, ranging from codec errors to connection failures etc. Right now, if such a failure occurs, we simply crash the task without propagating the error in any way, or contextualizing the failure in a log (beside a rust backtrace).

erwanor commented 1 year ago

Related to https://github.com/penumbra-zone/penumbra/issues/689

xla commented 10 months ago

@erwanor I'm interested in tackling this issue. First I want to clarify that there is no way to propagate non-application errors back to Comet, in fact it is expected that for any such error the ABCI app exits and both processes are meant to restart to initiate the Crash Recovery. Unless there has been a change in Comet within the last couple of months that is the expected behaviour.

erwanor commented 10 months ago

@xla great point! i have amended the issue. do you have something specific in mind to address it? if you have the appetite for it, we could track the connection handles and propagate the error to the application when a worker fails. otherwise, logging would already be a good first step.