test: add gas benchmarks

smol-ninja commented 2 months ago

Closes https://github.com/sablier-labs/v2-core/issues/876 and https://github.com/sablier-labs/v2-core/issues/872.

Changelog

Added

New tests under /benchmark dir. These tests compute gas usage of user facing functions and store them inside /benchmark/results dir in tabular format.
EstimateMaxCount.sol to generate values for MAX_SEGMENT_COUNT and MAX_TRANCHE_COUNT
benchmark script in package.json
New variables in Base.s.sol to store max values mapped by chain IDs: segmentCountMap and trancheCountMap
Bash script: update-script-counts.sh to update the count maps in Base.s.sol
isBenchmarkProfile function in Utils.sol
Add benchmark profile in foundry.toml

Removed

.gas-snapshot file
gas:report, gas:snapshot script from package.json

Changed

Changed visibility of MAX_SEGMENT_COUNT and MAX_TRANCHE_COUNT to public in Precompiles.sol

update-script-counts.sh output:

smol-ninja commented 2 months ago

@andreivladbrg feel free to review it now.

smol-ninja commented 1 month ago

Thanks for the review @PaulRBerg.

base fee issue for estimating max segments and max tranches

Thank you for pointing this out. See my comment below.

Since the tests are run against REVM and not against each chain's bespoke EVM, the results are probably off. For the most accurate results, the gas estimation script should run against a mainnet fork for each chain.

I agree with your point but when we fork a chain in foundry, it is not an actual fork i.e. it does not exactly replicate the EVM op-codes and the gas usage of each op-code. It only copies some of the chain parameters such as baseFee, blockNumber etc (see here).

Now, there are two things:

1. Estimating max count for segments and tranches

~Since it affects the max count, we should include the actual gas requirement for eth transfer on each chain.~ Its dynamic on Arbitrum as well. So what if we subtract 1M from the block gas limit to account for dynamic gas usage?

And since its hard to replicate the actual EVM with the current tooling, it not possible to estimate the true execution cost on all chains.

2. Benchmark gas usage

We don’t need to do this for each chain because:

Benchmarking against Ethereum is more important due to its high gas costs.
Optimizing for Ethereum generally leads to improvements on other chains as well in terms of execution cost. This may not necessarily be entirely foolproof because the OP_CODE gas cost may vary, but we are not aiming to customize optimization for each chain, which would also result in slightly different implementations for each of them.
Foundry forks do not copy the OP_CODE states of the chain.

andreivladbrg commented 1 month ago

Before giving a full review, I want to note that the gas difference between chains is not that big. I did a test on this repo:

Arbitrum: 150,933 gas for create linear Mainnet: 152,019 gas for create linear

Providing the gas costs for each would significantly increase the complexity (you can see what I did in the test/fork-multichains branch), and running it would take way much longer.

~~What would be safer is to simply subtract by 1 or 2 the count for security reasons~~

Later edit: just saw the contract EstimateMaxCount

smol-ninja commented 1 month ago

I want to note that the gas difference between chains is not that big

Thats because Foundry does not fork the actual EVM. It only copies the basic chain values. See my comment here.

smol-ninja commented 1 month ago

I've re-calculated max counts using 1M less units from the block gas limit, values are now lowered by 10-20.

PaulRBerg commented 1 month ago

To @andreivladbrg - great observations.

To @smol-ninja:

I agree with your point but when we fork a chain in foundry, it is not an actual fork

Ooh, I see. I didn't know that.

what if we subtract 1M from the block gas limit to account for dynamic gas usage?

Sounds good

Benchmarking against Ethereum is more important due to its high gas costs.

Not sure this is valid?

We are estimating gas limits, not gas prices. Some rollups use much higher gas limits than Ethereum, e.g. Arbitrum.

smol-ninja commented 1 month ago

Thanks for the review both of you.

@andreivladbrg

I would suggest a conservative default of 500.

Agree and sounds good. Made changes in the latest commit.

The gas costs for create functions can be influenced by the Broker argument, if it is set, there would be more operations made (including one more ERC20 transfer). So I would suggest to change this in the benchmarks

What would you suggest? To have two rows - one with broker and another without broker? Almost all the streams currently has no broker.

The gas costs for withdraw function needs to have more edge cases (especially for dynamic/tranched). If there more segments/tranches it will be more expensive, as it needs to get through all of them, depending what the current time is: when the current time is > endTime, the calculation function will return here, so it would be cheaper when the current time is at the last segment/tranche it is going to be more expensive See also what I did here. Last table in README.

Agree with the point that the actual gas usage depends on several factors including Broker transfers and the number of segments/tranches. I had a look at the link you shared but I couldn't catch your suggestion. Do you suggest including gas usage for more segments and tranches in the table?

Originally I did not include it because we cannot cover all the edge cases without increasing the complexity of the table. However, the current table includes gas usage for 2, 10 and 100 counts which imo gives a flavour of approximate gas consumption of create functions.

What would you say if we move test/benchamarks solidity files to ~/benchmarks and then add a new dir called gas-results inside this for the .md files?

May I know why would you like to move benchmark test files to a separate folder? I like the current approach of having test files in the test folder and markdowns in the benchmark folder.

I am proposing to make this script similar to update-precompiles one. Instead of displaying information to just adjust the values in BaseScript. For informations only, we can call solidity function estimateSegments. I am currently working on this change change, if we agree, I can create PR for this.

Sounds good and thank you for this.

@PaulRBerg

We are estimating gas limits, not gas prices. Some rollups use much higher gas limits than Ethereum, e.g. Arbitrum

My point was that users care about the gas limit on Ethereum because of the high gas price. On Arbitrum, the gas price for 1 unit is so low that it does not matter so much (it's just my hypothesis, maybe it matters, but I'd love to know arguments supporting that). Wdyt?

smol-ninja commented 1 month ago

@andreivladbrg I have made some minor improvements to your changes. Please let me know if you have any criticism against them. Otherwise, I'd request you approve this PR so that we can merge it.

andreivladbrg commented 1 month ago

@smol-ninja Before having a last look, could you please update the PR OP?

smol-ninja commented 1 month ago

@andreivladbrg updated.

PaulRBerg commented 1 month ago

My point was that users care about the gas limit on Ethereum because of the high gas price. On Arbitrum, the gas price for 1 unit is so low that it does not matter so much (it's just my hypothesis, maybe it matters, but I'd love to know arguments supporting that). Wdyt?

I might just not understand how the gas benchmarking works in Foundry, but my point is that if we estimate the segment and the tranche count using the vanilla EVM gas limits, they will be off by a wide margin when run on Arbitrum. For example, we might believe that the gas consumption is $X$, but in practice it's $20*X$ on Arbitrum. This shouldn't be a problem if the block gas limit is scaled by a similar factor on Arbitrum, but I did not check this.

smol-ninja commented 1 month ago

if we estimate the segment and the tranche count using the vanilla EVM gas limits, they will be off by a wide margin when run on Arbitrum

You're right and I agree. Unfortunately, I have not come across a good method to calculate it precisely using Foundry. I did some research today on how foundry forks calculate gas usage. The opcodes and the associated gas are determined by the value of evm_version that we set in the foundry profile, regardless of fork rpc and fork block number. We are using paris and the Foundry only supports a handful of them, all based on Ethereum mainnet.

Thus, when we fork a chain, it only copies basic block data and contract state from that forked block number, whereas the execution gas of the transaction is determined by the evm_version used.

if the block gas limit is scaled by a similar factor on Arbitrum

Block gas limit on Arbitrum (and most chains) is similar to ethereum i.e. 32M. But it's not enough to calculate the actual values for the max count. We also need to know the real value of execution gas which is hard without Foundry supporting opcodes and associated gas for L2.

PaulRBerg commented 1 month ago

Thanks for confirming @smol-ninja.

I have some ideas on how to address this problem, but I'll share them in a separate discussion.

It appears that most chains have kept the EVM execution costs (nearly) identical with Ethereum. It's just the minimum transaction gas that differs significantly, and we need to pay close attention to.

sablier-labs / v2-core