Keep the vyper compiler version signature in the runtime code

mds1 commented 1 year ago

Simple Summary

Per @banteg on twitter I learned that vyper 0.3.10 plans to move the version metdata from the initcode to creation code (ref https://github.com/vyperlang/vyper/pull/3471)

I suggest keeping this in the initcode.

Motivation

Finding initcode is much harder than finding runtime code. Given an address, it's trivial to query for it's runteime code using eth_getCode and parse the vyper version out of it. It's much harder to binary search across all blocks to find when a contract was deployed, then trace each transaction in that block to find creation code (You could of course rely on etherscan or other data providers, to provide this, but IMO minimizing reliance on other services and ensuring data is easy to find is a better approach)
My understanding is that the rationale was that moving it to the creation code was either (1) cheaper, and/or (2) more robust since there's no initcode size limit.
1. On (1), the cost difference here is negligible. In both cases the total bytecode being deployed is the same, and the only difference is a few less bytes being returned from the constructor. Compared to the cost of a contract deployment, this will be an unnoticeable savings
2. On (2), EIP-3680 is now live which limits initcode size to 2x the runtime size, so if bytecode size was a limit moving it to creation code does help since 2x is a lot bigger. I'd argue the few bytes savings is likely to not be the deciding factor in many cases (there are of course exceptions)

My opinion is that the gas savings from (2) don't outweigh the benefits of having the version in the runtime code, as easy access to contract version numbers can be critical and time-sensitive during exploits. Additionally often times you might fetch code from an RPC and use tools like heimdall-rs or whatsabi to analyze it (there are many tools that operate on runtime code), so having more info in runtime code avoids introducing reliace on e.g. etherscan

From talking with @big-tech-sux, some arguments on the other side are that currently the version is at the end of the runtime code, but followed by immutables. So you still need the source, immutables layout, or a regex to parse the version. One idea is to move the version to the end after immutables, but it makes the implementation a bit more complicated and complexity compounds

Ultimately I just wanted to raise the other side of the decision here since I didn't see it much discussed in the PR, and IME often times you have runtime code from a node + local tools but don't want to introduce indexers to provide e.g. creation code.

Specification

I think just revert https://github.com/vyperlang/vyper/pull/3471? Basically, ensure the CBOR encoded vyper version (and other other metadata) is present in deployed code, not creation code.

Backwards Compatibility

My understanding is that 0.3.9 has the behavior being requested here, so there are no backwards compatibility issues unless 0.3.10 is released with the behavior from https://github.com/vyperlang/vyper/pull/3471

Dependencies

N/A

References

Twitter convos (nothing new here, all the same info as above, just linking the threads for completeness)

Copyright

Copyright and related rights waived via CC0

banteg commented 1 year ago

i can appreciate that vyper optimizes for runtime over off-chain tooling, so just sharing some perspective.

i don't think the version being at a random position before immutables vs at the end poses a big problem, we can just regex.

but it not being in runtime code is more serious because you need to spent significantly more effort to find it. without using an explorer like etherscan to just look it up, you need to perform several steps:

binary search a block where the contract code appeared
then trace all txs from that block
finally find the initcode matching on the created address.

this would require an archive node with tracing capabilities which is not practical or even available on some networks. this would probably turn people to centralized services or indexers which is not optimal for decentralization.

quickly finding all contracts of a certain version could be crucial in certain situations like one that has just unfolded.

charles-cooper commented 1 year ago

Another issue which might be relevant (although it can be considered orthogonally as an entirely separate issue) is whether it could be worth it to include length of the data + immutables sections in the runtime and/or initcode. This would make disassembling contracts easier, and for the issue noted above it would make it not matter whether the signature or immutable section comes first.

charles-cooper commented 1 year ago

by the way, (per @banteg) otterscan ots_getContractCreator (which erigon supports and reth is adding support for) seems to be a fairly easy way to get contract creation trace. so it seems to me it increases overhead for offchain indexers and analysis, but not by a whole lot.

charles-cooper commented 1 year ago

i was able to modify @pcaversaccio 's dune query (https://dune.com/pcaversaccio/vyper-deployment-statistics) to output all vyper contracts >= 0.3.4 based on creation trace, and the query took 60 seconds. so yes, this is a bit of a centralization risk, but also keeping track of all relevant chains is by its very nature somewhat of a centralized task, so i don't see it as too big of an issue.

vyperlang / vyper