Open pepyakin opened 2 years ago
I have not measured performance yet, but I recall that the performance hit was about 10% longer block import time when panic = "unwind"
was turned on.
As for the binary size, substrate
shrinks from 60Mb to 46Mb simply by switching to "abort" in current master.
Also, since we had this requirement for so long, there is a chance that we relied on this property somewhere either here or in polkadot. Before jumping the gun we should check whether there are any such places.
This should be as simple as searching for any custom panic handlers, right?
This should be as simple as searching for any custom panic handlers, right?
Yes, likely, but I won't bet on that.
Hmm.... switching the whole binary to panic="abort"
might be somewhat risky and might open us up to a nasty denial of service attack. It'd just take one out-of-bounds array access or a stray unwrap()
to crash the whole node, and we need to remember that we do have, like, a thousand dependencies and our nodes are exposed to the network.
So I don't think this is necessarily a good idea to do globally due to the risks involved. That said, if we could maybe split a part of the executable into a separate process (e.g. run the WASM executor in a separate process?) which would be well isolated (especially from the network) and whose potential failure we could handle from the outside (e.g. quickly restart it in case of an abort) then I think it'd make sense to consider using panic="abort"
over there.
Current panic handling looks like a bit of a mess currently. Main thread sets a custom panic handler here which aborts. Spawned tasks are split into "essential" that abort on panic, and "non-essentials" somewhat arbitrary. E.g. transaction pool is considered "essential" but libp2p networking and block proposer are not 🤔
Any panic outside of the runtime was already always an abort?
At the moment of writing, we have
panic = "unwind"
requirement for all substrate binaries.This is a historical artifact of us having a native runtime. After https://github.com/paritytech/polkadot-sdk/issues/62 and https://github.com/paritytech/substrate/pull/10872 we migth remove the native runtime.
In that case we might be able to lift the
panic = "unwind"
requirement. Binaries withpanic = "abort"
should be smaller and more performant, although it's not clear on how much.Also, since we had this requirement for so long, there is a chance that we relied on this property somewhere either here or in polkadot. Before jumping the gun we should check whether there are any such places.