vyperlang / vyper

Pythonic Smart Contract Language for the EVM
https://vyperlang.org
Other
4.84k stars 789 forks source link

ABI v3 #2542

Open charles-cooper opened 2 years ago

charles-cooper commented 2 years ago

stub issue, some ideas for reducing calldata as EIP 4488 introduces a calldata limit. plus with rollups, transactions are getting more calldata heavy anyways.

References

EIP 4488

Copyright

Copyright and related rights waived via CC0

charles-cooper commented 2 years ago

Related to reducing calldata cost: https://github.com/ethereum-optimism/optimistic-specs/issues/10

gnidan commented 2 years ago

This ought to be an EIP itself. So much would have to change to support this, and it seems very risky for Vyper to adopt this without larger community buy-in.

charles-cooper commented 2 years ago

This ought to be an EIP itself. So much would have to change to support this, and it seems very risky for Vyper to adopt this without larger community buy-in.

That's a great point. I mean this is definitely in the idea phase and we would not want to undertake this without larger community buy-in. Would you be interested in helping us to draft and shepherd through the EIP process?

gnidan commented 2 years ago

Would you be interested in helping us to draft and shepherd through the EIP process?

Sure, absolutely! Also happy to reach out to Solidity folks. Would be nice if we could coordinate so languages don't end up having competing interface standards.

fubuloubu commented 2 years ago

Would you be interested in helping us to draft and shepherd through the EIP process?

Also happy to reach out to Solidity folks.

Did you happen to connect with them on this?

ekpyron commented 2 years ago

So what's your current plans around this (since we had at least a very brief discussion about this at devconnect)? In general, the ABI should really be specified as a proper cross-language standard - the worst thing that could happen would be to fragment over this. So we should probably think about how best to organize this. In case you want to go ahead with this soon, we should probably try to schedule a call about it in the near future?

I'm also tagging some people from Fe as well: @g-r-a-n-t, @cburgdorf But we should also probably think about whom else to reach out to.

charles-cooper commented 2 years ago

I'm not in a particular rush! But @gnidan and I were talking about putting together an EIP for this.

esaulpaugh commented 2 years ago

Does this have an advantage over using RLP? ABI arguments and return values can be represented as RLP right now.

Tuples and Arrays are represented by RLPList and everything else as RLPString: https://github.com/esaulpaugh/headlong-cli#decode

For example:

java -jar headlong-cli-1.1-SNAPSHOT.jar -me "(function[2][][],bytes24,string[1][1],address[],uint72,(uint8),(int16)[2][][1],(int32)[],uint40,(int48)[],(uint),bool,string,bool[2],int24[],uint40[1])" "f4f3f298191c766e29a65787b7155dd05f41292438467db93420cade98191c766e29a65787b7155dd05f41292438467db93420cade98191c766e29a65787b7155dd05f41292438467db93420cadec2c17ad594ff00ee01dd02cc03cafebabe990688077708660989fdfffffffffffffe04c107c8c7c6c109c382fff5c8c111c584ffffffed85fca527923bcac17ec786ffffffffff82c10a01866661726f7574c20101c6031483fffffac584fffffffe"

See also https://github.com/esaulpaugh/headlong/blob/master/src/main/java/com/esaulpaugh/headlong/util/SuperSerial.java

Handling negative numbers (something traditional RLP can't address because it doesn't have a schema like ABI does) has proved tricky, but I believe I have it all working correctly and fairly well tested. I know RLP is no longer the most fashionable thing, but it achieves its original design goal of space efficiency respectably, and if it ain't broke...

In RLP, the original example (string, string) ("abcd", "efg") would be: 0x846162636483656667, nine bytes.

esaulpaugh commented 1 year ago

@charles-cooper I have a proof-of-concept https://github.com/esaulpaugh/abiv3

We should collaborate.

esaulpaugh commented 1 year ago

The admittedly limited feedback I've received from experienced Solidity contract developers indicates that gas usage is their first, last, and only concern with respect to calldata, and that manual bit level hacking will always be cheaper. There also appears to be a strong bias towards compatibility with existing contracts as opposed to compatibility among future contracts.

esaulpaugh commented 1 year ago

I have no idea what I'm doing python-wise but an attempt was made: https://github.com/esaulpaugh/abiv3/tree/master/python

Work in progress.

esaulpaugh commented 1 year ago

@gnidan @charles-cooper

I've got Java and Python prototypes set up to use an unsigned integer as the function selector instead of a hash, to save space. Integer arrays can also encode elements fixed-width or variable-width as desired, and both are equally valid.

Fixed-width enables constant-time random access to array elements which is useful for very large arrays. And it can be more space-efficient in some cases too because values are padded only to the width of the widest element and not to the width of the datatype. And they don't require an RLP prefix per-element.

I'd be interested to know what y'all think and whether anyone wants to jump on a call about an EIP. I'm @esaulpaugh on telegram.

esaulpaugh commented 1 year ago

I plan on submitting an ABIv3 draft EIP soon, so if anyone wants to help author it, let me know. I'm also working on some reference code in Yul to demonstrate decoding in the EVM.

@gnidan @charles-cooper

charles-cooper commented 1 year ago

one interesting thing i realized while chatting with @esaulpaugh is that "calldata" is comparatively cheaper for inter-contract calls. so it might be worth having a more compressed encoding for calldata from eoa-initiated txns and an unpacked encoding (which is easier to decode) for inter-contract calls.

charles-cooper commented 1 year ago

Does this have an advantage over using RLP? ABI arguments and return values can be represented as RLP right now.

Using RLP is pretty much a non-starter here as it is very inefficient from an encoding/decoding perspective. It is also not simple, which is important when we consider implementation correctness. For instance, to decode an RLP int, the pseudocode looks something like

int = shr 248 (calldataload ptr)
len = 1
if (int_byte) > 0x80:
    len = add(len, sub(int, 80))
    int = shr(sub(32, mul(8, sub(int_byte, 80)), calldataload( add (1, ptr)))
ptr = add(ptr, len)

this is already like a couple dozen instructions / 100 gas, as compared to the proposed alternatives of small/packed ints (shift and mask after calldataload) or varints with a length byte (two shifts and masks after calldataload). In other words, efficiency and simplicity of the encoder/decoder need to be considered as well.