transmissions11 / solmate

Modern, opinionated, and gas optimized building blocks for smart contract development.
GNU Affero General Public License v3.0
3.93k stars 647 forks source link

Optimize merkle proof by having the arguments in calldata #242

Open hrkrshnn opened 2 years ago

hrkrshnn commented 2 years ago

https://github.com/Rari-Capital/solmate/blob/f878f20ecd6a23dd4105ee3eed7291b6311ce1fb/src/utils/MerkleProof.sol#L9

Using calldata is ideal here and would save a decent chunk of gas.

https://github.com/Rari-Capital/solmate/blob/f878f20ecd6a23dd4105ee3eed7291b6311ce1fb/src/utils/MerkleProof.sol#L20-L36

But this means that the assembly routine would be more complex. Consider the following instead:

  1. Use a regular for loop to load the value from calldata.
  2. Have a subroutine called cheapKeccak, which uses scratch space for keccak that would do the same computation.
Vectorized commented 2 years ago

Saves around 200 gas if there are 5 elements in proof.

We can make a copy of the function just for calldata. Should we? If @transmissions11 is ok with some code duplication, then let's go.

function merkleVerifyAssemblyCalldata(
    bytes32[] calldata proof,
    bytes32 root,
    bytes32 leaf
) internal pure returns (bool isValid) {
    assembly {
        let computedHash := leaf

        // Get the memory start location of the first element in the proof array.
        let data := proof.offset

        // Iterate over proof elements to compute root hash.
        for {
            // Left shift by 5 is equivalent to multiplying by 0x20.
            let end := add(data, shl(5, proof.length))
        } lt(data, end) {
            data := add(data, 0x20)
        } {
            let loadedData := calldataload(data)
            // Slot of `computedHash` in scratch space.
            // If the condition is true: 0x20, otherwise: 0x00.
            let scratch := shl(5, gt(computedHash, loadedData))

            // Store elements to hash contiguously in scratch space.
            // Scratch space is 64 bytes (0x00 - 0x3f) and both elements are 32 bytes.
            mstore(scratch, computedHash)
            mstore(xor(scratch, 0x20), loadedData)
            computedHash := keccak256(0x00, 0x40)
        }
        isValid := eq(computedHash, root)
    }
}
transmissions11 commented 2 years ago

tryna think of a case where u need the proof to be memory? im totally cool with just supporting calldata

transmissions11 commented 2 years ago

but this seems like a really good change, 200 gas is sick

hrkrshnn commented 2 years ago

@Vectorized Do you really have to use calldataload(..) in assembly? Do you have a benchmark for just proof[idx]?

Vectorized commented 2 years ago

@hrkrshnn I tried proof[idx], but the assembly cannot compile.

Here are some numbers:

[PASS] testValidProofSupplied() (gas: 2464)
[PASS] testValidProofSuppliedCalldata() (gas: 2262)
[PASS] testVerifyEmptyMerkleProofSuppliedLeafAndRootDifferent() (gas: 1733)
[PASS] testVerifyEmptyMerkleProofSuppliedLeafAndRootDifferentCalldata() (gas: 1565)
[PASS] testVerifyEmptyMerkleProofSuppliedLeafAndRootSame() (gas: 1727)
[PASS] testVerifyEmptyMerkleProofSuppliedLeafAndRootSameCalldata() (gas: 1537)
[PASS] testVerifyInvalidProofSupplied() (gas: 2460)
[PASS] testVerifyInvalidProofSuppliedCalldata() (gas: 2304)

For a fair comparison between the two functions, all tests use calls, which explains why the numbers for the old tests have increased.

Vectorized commented 2 years ago

Looks like there may be a very niche use case for memory https://github.com/fx-portal/contracts/blob/baed24d22178201bca33140c303e0925661ec0ac/contracts/tunnel/FxBaseRootTunnel.sol

But for their case, I think it's best if they have a custom implementation, which they did.