paritytech / zombienet

A cli tool to easily spawn ephemeral Polkadot/Substrate networks and perform tests against them.
https://paritytech.github.io/zombienet/
GNU General Public License v3.0
160 stars 91 forks source link

High scale testing MVP #79

Open sandreim opened 2 years ago

sandreim commented 2 years ago

We're looking at writing an integration test suite that focuses on performance testing, more specifically on a list of key indicators that are covered in https://github.com/paritytech/polkadot-sdk/issues/874. The current design of Zombienet for configuration and DSL make it an easy to write tests for single digit sized networks and provides very explicit primitives for testing metrics and logs (alice: parachain 100 block height is at least 10 within 200 seconds). I'll focus on what I think we need to implement to make writing tests easy for test scenarios of an order of magnitude larger at least.

I'm breaking down everything down into two: Test configuration and the DSL.

Test configuration

In the context of higher scale, the goal is to enable the configuration to be defined in bulk, such that we don't need to talk about individual validators and their configuration (binary and args), which is cumbersome for 100 validators for example.

Where we are at

Test scenario (DSL)

The goal is to enable writing test assertions that looks at groups of validators rather than only one.

Where we are at

Issues and other improvements

I've stumbled upon some issues or missing functionality:

CI integration

It doesn't seem to be a good idea to have these tests run as part of the per PR pipeline, because of the long duration and high cost of scaling the kubernetes. My proposal is to run a subset of small scale variants of the tests on the PR pipeline and run the high scale tests at release checkpoints or on a need to basis.

That being said, it looks like a lot of work, and at the same time we want to run these high scale tests sooner rather than later. My proposal is to build this incrementally starting with what I consider to be the MVP:

Link to a branch with a sample test and some comments to add more context: TBD.

drahnr commented 2 years ago

I think we need to split this up, into a PR pipeline and a release pipeline. All open issues should point to issues that add additional context that's required for implementation. Percantage based logs are a nice to have, since those tests should be rather deterministic, so this is a bit of a longer shot, but that goes hand in hand with scaling up the number of validators and grouping.

pepoviola commented 2 years ago

Hi @sandreim, thanks for the feedback. I think there are several things to work in this issue but the priority is to add support for scaling the network easily right? The validation groups sounds great. Let me start working on the syntax for supporting this and we can use that as starting point.

Thanks!

sandreim commented 2 years ago

Hi @sandreim, thanks for the feedback. I think there are several things to work in this issue but the priority is to add support for scaling the network easily right? The validation groups sounds great. Let me start working on the syntax for supporting this and we can use that as starting point.

Thanks!

Yes, validator groups and being able to spin up many validators (not just the limited set we have now) and parachains. Other than that, launching them in parallel would also help iterate faster in development would be good to start with.