vyperlang / vyper

Pythonic Smart Contract Language for the EVM
https://vyperlang.org
Other
4.87k stars 797 forks source link

VIP: Make dynamic calldata arrays lazy #4223

Open Philogy opened 2 months ago

Philogy commented 2 months ago

Simple Summary

Avoid redundant copies and memory allocations when interacting with dynamic arrays in calldata (Bytes[N], String[N] or DynArray[T, N]) by not copying them into memory by default.

Motivation

This is important for gas efficiency as it can significantly increase the gas cost when using calldata arrays, making it infeasible to use Vyper for use-cases where you need to process variable length arrays while accommodating potentially large lengths.

At Sorella e.g. we're building a Uniswap hook that aims to be able to interact with pools initiating swaps as well as settling large batches of user intents. The wasted cost of having the max size always be allocated and the lack of ability to lazily process elements one-by-one is a huge deterrent when considering Vyper as a smart contract language for our use-case.

Specification

This may not require any semantic or syntactical changes at all as the difference between calldata & memory could be abstracted away from the user, only ever copying the entirety of the object in memory if it's passed an argument to an internal function.

Alternatively to be more flexible such as use in internal functions storage location annotations could be added e.g.:

struct Order:
    amount: uint256
    price: uint256

def process_orders(orders: Calldata[DynArray[Order, 10_000]]):
    ...

Semantically, dynamic types with location Calldata[T] would behave identically from the smart contract dev's perspective, however at the codegen level the Calldata[T] type would inform the compiler that member or length access or slicing generate different code, only loading sections that are directly accessed at any time rather than loading it in its entirety by default.

This could also serve as a predecessor to a larger "data location" system in Vyper, similar but more expansive than Solidity's, allowing the compiler to understand and optimize interactions with different data locations (calldata, memory, returndata, self code, external code, transient storage, persistent storage).

Backwards Compatibility

Should not cause any backwards compatibility issues. The Calldata[T] annotation would be opt-in, if left out the compiler will interpret it as an in-memory array, copying calldata contents if the entry point of an external call.

Dependencies

(Unknown)

References

Somewhat inspired / related to recent criticisms of Vyper by others explaining how it doesn't handle dynamic data very well: https://x.com/jtriley_eth/status/1830325738144448804.

Copyright

Copyright and related rights waived via CC0

xrchz commented 2 months ago

This is a huge issue indeed. The data location system is sorely needed for Vyper - currently the excessive copying when calling internal functions, for example, is extremely painful.