spacemeshos / go-spacemesh

Go Implementation of the Spacemesh protocol full node. 💾⏰💪
https://spacemesh.io
MIT License
770 stars 215 forks source link

Improve systests #6494

Open jellonek opened 6 days ago

jellonek commented 6 days ago

Description

At the moment systests contain a lot of spagetti code which seems to be time to time extended with another piece glued into it. In result:

fasmat commented 6 days ago

Here are my 2 cents to the current problems with system tests:

There should be a single test that asserts all the functional requirements we have for go-spacemesh:

Non-functional tests probably require their own test each:

I am unsure if this needs its own test or can be part of TestSmeshing:

The basic idea is that the big test spawns a network at the start and then runs multiple tests (partially at the same time) on the same network with the same nodes instead of having to tear down and setup a new network multiple times.

Additionally we should consider to cover all supported modes of smeshing during the one big test: some nodes should use supervised smeshing, some should use remote smeshing, some multi-smeshing (ideally with at least one of those marrying their identities).

ivan4th commented 6 days ago

This probably should be taken into account: https://metoro.io/blog/go-production-performance-gotcha-gomaxprocs TL;DR: GOMAXPROCS may default to an unreasonable value in case of a containerized workload running on a machine together with a lot of other workloads. Not exactly sure if it is relevant, but probably worth checking

poszu commented 5 days ago

@fasmat It makes sense and it would speed up the tests significantly :+1:, but we need to be careful and avoid creating "subtests" running in parallel on a single network that depend on each other as it would be a nightmare to debug if things go south. I think it's possible to have a test structure like this:

func TestNetwork(t *testing.T) {
  // setup a cluster
  t.Run("sending transactions", ....) // send transactions and observe accounts changes
  t.Run("poet failures", ....) // kills a poet and oberves that ATXs are still produced
  .... more

The problem is, that once things stop working because the poet was killed, the other test(s) would most likely start to fail too and telling why (what's the root of the problem) might be difficult.

jellonek commented 4 days ago

@ivan4th we have that hardcoded to 4 in https://github.com/spacemeshos/go-spacemesh/blob/cc65f271f3e21b519532f2924ab227ccf352c2de/systest/cluster/nodes.go#L914

fasmat commented 4 days ago

Also regarding GOMAXPROCS: I changed it to the suggestion in the blog @ivan4th linked, but didn't have a feeling it changed anything in terms of stability - multiple runs still were throttled, so I changed it back to the hard coded 4.