evm: modernize upgrade path

gator-boi commented 5 months ago

This PR modernizes the upgrade logic in both the EVM TokenRouter and MatchingEngine implementations.

nonergodic commented 5 months ago

Implementation.sol is what @kcsongor wrote for NTT, no? Still no code sharing?

Also, from what I can tell, the modernization added a ton of code: TokenRouter size went from 13.846 to 16.012 bytes

I've done a similar simplification in the swap-layer repo that reduced bytecode size from 13.015 to 12.172 (this is based on a commit from ~mid January - I think this one).

I know that Csongor was trying to dot all i-s and cross all t-s with his implementation, but if these numbers are representative, I think that my much more light-weight implementation is highly preferable (no OZ dependency, only a single external function with a very high selector, ...). Spending 10 % of the available contract size purely on the upgrade mechanism strikes me as excessive.

gator-boi commented 5 months ago

@nonergodic This is the implementation that @kcsongor wrote for NTT, with one minor difference (allowing initData in initialize). @djb15 and I chatted about whether we should use your lightweight implementation instead, but ultimately decided against it since the NTT version is well tested and has already been audited twice. Although I'm open to discussing this with a wider group given the decrease is byte code. @djb15 and @a5-pickle thoughts?

djb15 commented 5 months ago

I actually think there's a world where we can take the good parts of both implementations whilst still benefiting from the audits and testing we've had on the NTT variant. I agree @nonergodic that the NTT one is pretty heavy on the bytecode, I think that's the big downside there...I think we can keep the interface pretty similar but reduce some of the dependencies. I was going to have another look at that today, will come back with a suggestion

djb15 commented 5 months ago

From my analysis last week, the NTT proxy base is ~1kB and the slimmed down proxy base is ~0.37kB. Most of the increase in bytecode in this PR comes from checking the immutables since each immutable you want to check needs a public getter.

Out of interest I added the immutable checking logic to the slimmed down ProxyBase today and that increased the bytecode to ~0.52kB. In my opinion the immutable checking is a nice feature that reduces the risk of misconfigurations on an upgrade so I'd like to see that used if possible. Do other folks think it's a waste of space?

I had a look at how we could reduce the bytecode of the NTT proxy base and from some probably poor deduction I think around 0.4kB of the bytecode of the NTT proxy base comes from the OZ Initializable.sol contract. This code is used by so many projects writing upgradeable contracts that I'm hesitant to throw that out the window to optimise for bytecode. So I actually take back what I said in the previous comment about removing dependencies.

(no OZ dependency, only a single external function with a very high selector, ...)

@nonergodic why are both of these important? In my mind having battle-tested OZ code is a positive and having a high selector isn't important for infrequently called methods like contract upgrades?

I think it's pretty awesome how small the slimmed down ProxyBase is, but when we've got 8kB of space left in the bank on the TokenRouter I'm struggling to see why we'd want to optimise for bytecode by switching to an un-audited proxy base?

Happy to change my mind here, just erring on the side of caution.

nonergodic commented 5 months ago

Awesome, thank you for looking at the numbers in more detail @djb15!

I personally think the Initializable contract, like a whole bunch of things in OZ, is an abomination. I think it introduces way too much complexity and people's eyes glaze over when they are forced to look at it, and so they generally don't. This gives rise to a cargo-cult style of programming where you do things the way everyone else does them without having much of a grasp on the rationale, giving rise to e.g. double "Upgraded" events like snuck into the code here (see one of the latest commits) along with the reason why this PR exists in the first place.

I do understand that in a security critical field there's a much higher utility to being conservative and doing things in a proven way, even if it seems antiquated or suboptimal. Being afraid of unknown unknowns is very much legitimate and reasonable people can disagree on what constitutes being properly conservative vs. being paranoid bordering on superstition. I guess I personally don't like the "Nobody has ever been fired for using OZ." mentality and would prefer to lean more on simplicity and my own understanding.

Besides the meta/temperamental/philosophical considerations, my main concern regarding size is not about the liquidity layer itself at all (as you rightfully point out, there's more than enough space left, so why not use it?). Rather it's about the swap layer and setting precedents more generally.

The swap layer currently stands at 18 kb (down from 22 after ripping out the timelocks etc. as we discussed a few months back) and is not yet feature complete. I've already been rather meticulously optimizing contract size but I expect I'll get pretty close to max size before everything is said and done, hence I can't really afford taking on excess weight like this (especially if I want to leave some breathing room to add another AMM or two later on), which raises the question: If our default proxy implementation is heavier than it needs to be, what do we do in cases where the trade-off is relevant (like the swap layer)?

I think this is my main complaint about OZ in general: It's just unnecessarily heavy and complex and thus reduces both the readability and the amount of usable space with every additional feature/base class you introduce. As a C++ dinosaur who loves the zero overhead principle (you only pay for what you use), I recoil from this sort of thing in horror. If we "seed" our own solidity sdk (because this is really where this stuff should live) with a similar approach, we'll either start heading down the same path, or we continue on our old ways where everyone uses whatever the hell they feel like and we keep reinventing the wheel over and over with minor differences... ("ProxyBase is used by the swap layer but Implementation is used by NTT and a slightly modified form by liquidity layer. Of course the legacy contracts use something else altogether...")

Final point in this long rant:

having a high selector isn't important for infrequently called methods like contract upgrades

That's precisely the point (see my dispergtation on slack from last November)! Infrequently called functions should have high selectors because they are called infrequently. Every selector that is low but not called frequently introduces an overhead 22 gas each! Implementation introduces 3 external functions: 439fab91 - initialize(bytes) 689f90c3 - getMigratesImmutables() 8fd3ab80 - migrate() so in expectation, the average function call will become about 39 gas more expensive when this base class is used (only 26 % of functions will not incur any extra cost, 40-26=14 % will incur 22 gas extra, 56-40=16 % will incur 44 gas, and the remaining 44 % will incur an additional cost of 66 gas).

For the curios, this is the complete external interface of the liquidity layer sorted by selector:

016ca178 cancelOwnershipTransferRequest()
038c0b66 confirmOwnershipTransferRequest()
05d20528 getInitialAuctionFee()
070043bf getOwnerAssistant()
07ff8a2e getRouter(uint16)
0a14f7c1 matchingEngineChain()
0fef8c5f updateFastTransferParameters((bool,uint64,uint64,uint64))
15e812ad getBaseFee()
1c6336d4 matchingEngineDomain()
256e4b42 getRouterEndpoint(uint16)
2a2d7e21 placeFastMarketOrder(uint64,uint16,bytes32,bytes,uint64,uint32)
3ce9f518 redeemFill((bytes,bytes,bytes))
439fab91 initialize(bytes)
5005c175 matchingEngineAddress()
52139372 placeFastMarketOrder(uint64,uint64,uint16,bytes32,bytes,address,uint64,uint32)
58de7fb4 getDomain(uint16)
5cf34bcf getMinFee()
6380b620 updateOwnerAssistant(address)
6606cbf1 getFastTransferParameters()
689f90c3 getMigratesImmutables()
69a85fcc addRouterEndpoint(uint16,(bytes32,bytes32),uint32)
6f621428 getMintRecipient(uint16)
722a21d4 updateRouterEndpoint(uint16,(bytes32,bytes32),uint32)
72630531 getDeployer()
84acd1bb wormhole()
85d18917 orderToken()
893d20e8 getOwner()
89485ac6 matchingEngineMintRecipient()
8d976a76 disableRouterEndpoint(uint16)
8fd3ab80 migrate()
93596c7b getPendingOwner()
a020df85 submitOwnershipTransferRequest(address)
afff4c13 getMaxFastTransferAmount()
b187bd26 isPaused()
b77b58d3 getMinFastTransferAmount()
bda392b2 setCctpAllowance(uint256)
bedb86fb setPause(bool)
c0054ae5 placeMarketOrder(uint64,uint64,uint16,bytes32,bytes,address)
d1bbb5a6 placeMarketOrder(uint64,uint16,bytes32,bytes)
d32e91ad enableFastTransfers(bool)
d8d5470d fastTransfersEnabled()
eb2c0223 upgradeContract(address)

i.e. for the unfortunately placed placeMarketOrder functions, the gas overhead on every call is 814 and 836 gas respectively.

For comparison, this is the totality of all public functions on the swap layer (saving both size and gas).

I'll leave it up to everyone else to judge whether this is "supreme autistic pedantry" or "the eye for detail that makes the difference in whether something becomes a great product or another mediocre, inefficient quagmire".

djb15 commented 5 months ago

Nice, thanks for the detailed reply!! Will have a think about this some more tomorrow.

Ah by "high selector" I was thinking "high up on the list" which is why I was confused...but this makes more sense now!

What's your opinion on the immutable checking during upgrades btw?

nonergodic commented 5 months ago

I don't think it should be done on-chain, rather they should be fetched and checked in e.g. an upgrade script if you want to ensure that you are using the same values as you are upgrading the various peer contracts on the various chains. What if a contract upgrade is supposed to remove or change one of the immutables? Right now, since the old implementation dictates what immutables must exist on the new contract and that they must equal their current values, this sort of change is now not possible in a single upgrade (unless you introduce a bunch of garbage code that's purely there to safisfy this requirement).

Another point for "OZ intializable makes everyone's eyes glaze over": We now have migrate() and reinitialize both serving the same purpose, no? And has anyone ever even seen a contract upgrade with migration code? In my experience this is all optimizing for stuff that's exceedingly rare, just like protecting against problems that can only arise if the contract contains a self-destruct opcode in the first place.

As another more philosophical/ranty point, I have high trust in myself and I don't like it when things try to dictate what I can or can't do ("Computer says no."). E.g. I carry appendix with one in the chamber and no manual safety and I feel perfectly comfortable doing so. Or put another way, as a C++ dinosaur, I'm used to wielding powerful tools that will blow up when misused (see the old joke: "If you want to shoot yourself in the foot, use C. If you want to take the whole leg off, use C++."). So my creed is "We don't need better guardrails, we need more virtuous devs."

When it comes to contract upgrades, the only thing I really worry about is accidental bricking of the contract (one-way door errors). Deploying with the wrong immutables might cause some amount of pain/damage but can be fixed by just doing another upgrade (two-way door). Ofc, if you use OZ's reinitalizer you'll have to bump the version and suddenly you have a different version number on the chain that had the deployment mishap...

If I had to personify OZ, I'd say it's the sort of guy who read the CIA handbook on how to hobble an otherwise productive organization by demanding that everything is done "by the book" and via committee, making sure that he'll never run out of work while also creating so many bureaucratic routines that nobody will ever be able to replace him and the ever expanding Rube-goldbergesque machinery he's put in place.

nonergodic commented 5 months ago

Another point for "OZ intializable makes everyone's eyes glaze over": We now have migrate() and reinitialize both serving the same purpose, no?

Just realized that was a mistake on my part. No reinitialize function, only a reinitializer modifier. So nevermind that, my eyes were glazing over.

Though just programmatically increasing the version by 1 (as we are doing) seems like a "I'm doing this because I have to" sort of move. I don't see it serving any actual purpose. I originally hallucinated a reinitialize function that takes a version number and ensures that you execute upgrades correctly in case you skip versions, but now I realize it's even sillier than that, especially since OZ documentations states that:

When version is 1, this modifier is similar to initializer, except that functions marked with reinitializer cannot be nested. If one is invoked in the context of another, execution will revert.

and

Note that versions can jump in increments greater than 1; this implies that if multiple reinitializers coexist in a contract, executing them in the right order is up to the developer or operator.

And even better:

WARNING: Setting the version to 2**64 - 1 will prevent any future reinitialization.

So if you somehow manage to overwrite storage and set that thing to its max value, you brick the contract for any future updates. Extremely unlikely, but still...

kcsongor commented 5 months ago

Though just programmatically increasing the version by 1 (as we are doing) seems like a "I'm doing this because I have to" sort of move. I don't see it serving any actual purpose. I originally hallucinated a reinitialize function that takes a version number and ensures that you execute upgrades correctly in case you skip versions, but now I realize it's even sillier than that

This is spot on, working with the OZ Initializable logic was definitely an uphill battle.

And has anyone ever even seen a contract upgrade with migration code? In my experience this is all optimizing for stuff that's exceedingly rare, just like protecting against problems that can only arise if the contract contains a self-destruct opcode in the first place.

I have seen (and written) many of those. It's always scary, and because it's relatively rare (to the point where two consecutive upgrades might be implemented/deployed by entirely different sets of people, with fresh opportunities to make mistakes), guardrails are important IMO.

"We don't need better guardrails, we need more virtuous devs."

We don't need airbags, we need more careful drivers. Imagine the fuel savings with all that excess weight removed!

FWIW I think it should be possible to cut out a lot of OZ crap without compromising on the desired guarantees of the upgrade logic (since OZ doesn't really buy us that much)

djb15 commented 5 months ago

FWIW I think it should be possible to cut out a lot of OZ crap without compromising on the desired guarantees of the upgrade logic (since OZ doesn't really buy us that much)

I've had a play with this again and it's not a trivial change to just cut out OZ stuff without rewriting portions of the implementation. Just copy/pasting the bits we need out of the OZ contracts has a trivial impact on the total bytecode size (~0.1kB). Not saying it's not doable, but it's not a freebie like I was hoping.

Also another data point on the immutable checking...the 5 immutable checks in the TokenRouter add 0.75kB to the bytecode so it's not a cheap guardrail by any means. But If we have the bytecode available I do still think it's a nice safety check given upgrades are probably the most risky operation that an admin ever needs to do.

I think we all agree OZ isn't exactly lightweight (that's putting it lightly) so there is absolutely a better variant that exists of what we have here. Priority for this codebase right now is security imo so I'm going to review this as is since we get the extra security checks.

If it can be prioritised we should definitely continue this discussion elsewhere though. We should agree on a proxy base variant that we can use everywhere that makes tradeoffs that the majority are comfortable with. This one definitely isn't perfect by any means but I do like it.

nonergodic commented 5 months ago

We don't need airbags, we need more careful drivers. Imagine the fuel savings with all that excess weight removed!

Absolutely, though the excesss weight comes from all the careless self-selecting themselves off the street (the driving analogy is ofc a lot less compelling for all sorts of real world reasons). As the saying goes "I tried to make it idiot-proof but they keep making better idiots!" Or alternatively: "It's time to take the warning labels off and let the problem of pervasive stupidity solve itself!"

But snark aside, the weight is important and also helpful because it's an objective factor, but my main point remains one about safety through simplicity, readability, and ease of use. If something is simple, elegant, and easy to use, you'll be happy to look at it and understand it. If it's ugly, disgusting, and a PITA to use you'll spend as little time on it as humanly possible, giving rise to all sorts of undesirable second order effects. As the author of your code you should make it compelling for the reader, not repelling.

a5-pickle commented 5 months ago

I added #95 so that the discussion can take place there. If this conversation does continue, please move it there.

The purpose of this PR was to update the existing upgrade logic taken from older EVM examples to something that a more recent example (NTT) has used.

wormhole-foundation / example-liquidity-layer

evm: modernize upgrade path #79