Open jstolarek opened 9 months ago
Implement this is our project. This requires answering at least the following questions:
Should this be a benchmark suite or a test suite? Can we re-use our existing Plutip tests to avoid repetition?
IIUC, we basically want to detect any improvements or regressions in our Plutus scripts wrt to script size, and execution costs. In that case, as I've done in the past, I propose to use golden tests for the script size, execution costs (ExCPU and ExMem) and the generated PIR (in case we removed something that shouldn't have been removed).
Why golden tests? Simple and effective way to verify by a PR reviewer if some changes resulted in unexpected changes in the Plutus scripts.
How to achieve it?
The proposed solution with using fee calculation is one way. However, that basically combines multiple parameters such as script size and execution costs. I would use the other functions which provide the information described above.
To actually get this granular information, I would look into plutus-ledger-api
.
Pros: lowest dependency footprint, and fastest testing times.
Cons: you don't get realistic numbers, and you have to construct the ScriptContext
yourself.
However, if you actually need real values, then you can have some e2e test which gives you real values (either using Plutip, or harcoding the protocol parameters and cost model). However, a faster feedback loop is necessary I believe.
Some exploration is necessary of course.
Maybe makes sense to split this into a separate story?
Why golden tests? Simple and effective way to verify by a PR reviewer if some changes resulted in unexpected changes in the Plutus scripts.
I'm not entirely convinced that we really need golden tests for this. At the moment we have golden tests to test ToData/From data instances and CBOR serialization, and this feels justified because the expected output of these files is quite big. However, for script size tests we currently just put the expected numbers in the Main file when defining the tests. This feels entirely sufficient and it is very easy to fix multiple tests because all the expected data is in the same file. Golden tests would complicate this and to me they feel like an overkill here.
I would use the other functions which provide the information described above.
No. The precise goal of this ticket is to finally start measuring the transaction fees - this is the ultimate metric that we want to optimize because this is what users of the system have to pay. Measuring proxy metrics such as script size, CPU usage or memory footprint forces us to do guessing when making design decisions. Example: I made a change that reduces script size by 10% but increases CPU usage by 20% - does that result in smaller fees? I don't mind having tests that also measure memory and execution units since that could potentially be helpful in troubleshooting, but we need to have the fees.
you can have some e2e test which gives you real values (either using Plutip, or harcoding the protocol parameters and cost model)
Any solution that requires access to a locally-running (or remote) Cardano infrastructure is not acceptable. Plutip is OK, but I would consider it the last resort. The requirement is that we should be able to freely change the protocol parameters, so that if mainnet and testnet have different fees then we want to be able to define tests for both by providing different protocol params. That being said, the hope here is that we can calculate the fees using library functions. In Haskell that would be quite simple, I think, but in PureScript we seem to be struggling.
We currently measure script sizes (cf.
script-size
testsuite in onchain code) and treat them as a proxy measure for estimating transaction costs. What we would like to have is something like:Where ProtocolParameters define the cost model for the network, and Transaction is a transaction that contains an example scenario we want to test (current plutip tests can be a good starting point here). In other words, we want to measure the exact transaction fees. Note that these are influenced not only by transaction size, but also by memory and CPU usage.
The goal of this task is to:
Protocol params can be see here, but for the purposes of testing we should probably have a way to import pre-defined protocol parameters for various networks.
IOG Jira: https://input-output.atlassian.net/browse/ETCM-6448