paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.39k stars 2.65k forks source link

Consider switching to `panic="abort"` #10874

Open pepyakin opened 2 years ago

pepyakin commented 2 years ago

At the moment of writing, we have panic = "unwind" requirement for all substrate binaries.

This is a historical artifact of us having a native runtime. After https://github.com/paritytech/polkadot-sdk/issues/62 and https://github.com/paritytech/substrate/pull/10872 we migth remove the native runtime.

In that case we might be able to lift the panic = "unwind" requirement. Binaries with panic = "abort" should be smaller and more performant, although it's not clear on how much.

Also, since we had this requirement for so long, there is a chance that we relied on this property somewhere either here or in polkadot. Before jumping the gun we should check whether there are any such places.

arkpar commented 2 years ago

I have not measured performance yet, but I recall that the performance hit was about 10% longer block import time when panic = "unwind" was turned on. As for the binary size, substrate shrinks from 60Mb to 46Mb simply by switching to "abort" in current master.

Also, since we had this requirement for so long, there is a chance that we relied on this property somewhere either here or in polkadot. Before jumping the gun we should check whether there are any such places.

This should be as simple as searching for any custom panic handlers, right?

pepyakin commented 2 years ago

This should be as simple as searching for any custom panic handlers, right?

Yes, likely, but I won't bet on that.

koute commented 2 years ago

Hmm.... switching the whole binary to panic="abort" might be somewhat risky and might open us up to a nasty denial of service attack. It'd just take one out-of-bounds array access or a stray unwrap() to crash the whole node, and we need to remember that we do have, like, a thousand dependencies and our nodes are exposed to the network.

So I don't think this is necessarily a good idea to do globally due to the risks involved. That said, if we could maybe split a part of the executable into a separate process (e.g. run the WASM executor in a separate process?) which would be well isolated (especially from the network) and whose potential failure we could handle from the outside (e.g. quickly restart it in case of an abort) then I think it'd make sense to consider using panic="abort" over there.

arkpar commented 2 years ago

Current panic handling looks like a bit of a mess currently. Main thread sets a custom panic handler here which aborts. Spawned tasks are split into "essential" that abort on panic, and "non-essentials" somewhat arbitrary. E.g. transaction pool is considered "essential" but libp2p networking and block proposer are not 🤔

bkchr commented 2 years ago

Any panic outside of the runtime was already always an abort?

arkpar commented 2 years ago

Any panic outside of the runtime was already always an abort?

I assumed so, but looking at the code this does not seem to be the case. Only "essential" tasks abort everything.