Open arnetheduck opened 5 years ago
But will it bail out instead of reconnecting and recovering? Find out by running a dozen or so iterations of this:
N=0; while true; do echo $N; rm -rf ~/.cache/nimbus/db; time ./build/nimbus; sleep 1 || break; N=$[$N + 1]; done
as discussed - we cannot allow random exceptions to be swallowed as their effect on the overall state is unknown (similar to a signal handler in unix, these exceptions can and will happen at any point in a nim program, including spots where they weren't anticipated by the author) - this counts for overflows, asserts and other similar errors - they indicate bugs in the application - like this particular instance - the root cause is a race condition or some other serious issue, and it needs urgent attention.
There is a new exception type in Nim called CatchableError
which is supposed to be the base type of all recoverable errors. Putting a blanket handler with this exception type may be reasonable sometimes, but I agree that in this particular situation it's probably better to stick to the list of specific expected failures such as PeerDisconnected
and RlpError
. Non-recoverable errors should always fail fast.
I agree here, as long as the error and thus cause/effect is not known, error should be raised. And in debug it probably should also assert (so it is clearly noticed).
I actually had a similar issue before in the rlpxAccept
code, where the exception would only be logged and it could be missed if not paying attention to all the logs. I noticed because I was specifically debugging that problem (AlreadyConnected
peer).
How is anyone supposed to work on this, if Nimbus keeps throwing random exceptions instead of logging them as recoverable errors?
@stefantalpalaru there are a few critical things to take heed of here:
thus, long story short, the way to work and make actual progress on the application is to focus on the root cause for the issue and make sure it's addressed properly - and make deliberate decisions about expected and unexpected behavior, falling back on something that's guaranteed to not cause further damage once an unaccounted for situation is encountered - in this case, show a diagnostic and shut down.
this hides bugs, like this one: