status-im / nimbus-eth1

Nimbus: an Ethereum Execution Client for Resource-Restricted Devices
https://status-im.github.io/nimbus-eth1/
Apache License 2.0
562 stars 106 forks source link

Hive: Nimbus startup and shutdown delays dominate testing time with `ethereum/consensus` #591

Open jlokier opened 3 years ago

jlokier commented 3 years ago

... and the time to run ethereum/consensus Hive tests is important to us. It's currently too slow.

Hive is the new Ethereum test suite, which may either displace or complement retesteth. Hive consists of a few different "simulators" and test suites, and one of those suites is by far the largest and slowest, ethereum/consensus, with about 28,000 tests.

When run on Geth, it proceeds on both my fastest machines at about 1 test per second, taking roughly 6-7 hours (but we can parallelise a bit to speed it up). This is the command line:

./hive --sim ethereum/consensus --client go-ethereum

Geth is running only a small fraction of that ~1 second. Most of the time goes in test setup and Dockerish things. (It's worth optimising the Docker setup to improve that, e.g. Docker is slower inside an LXD container or an old kernel. We can add some multi-core parallelism with --sim.parallelism for a decent speedup, although there's more I/O and space used too.)

When run on nimbus-eth1, it is much slower at 7-8 seconds per test. This is the command line(*):

./hive --sim ethereum/consensus -client nimbus

That's a lot slower than Geth, ballpark 15 times slower, and it means Nimbus time is the dominant factor. dominant. That's too slow. It won't even complete the test suite in 24 hours, unless you have enough cores to throw at it.

Using --docker.output we can see most of the time is spent in slow startup or shutdown of nimbus-eth1, probably waiting for networking functions, but it could be unlogged database actions. It isn't slow when importing a genesis block. Whatever makes #585 slow might have something to do with this, but startup looks slow too.

(*) After setting up Hive for nimbus-eth1 according to nimbus-eth1/hive_integration/readme.md.

kdeme commented 3 years ago

Not sure this is part of the slow startup but assuming that it is the general nimbus binary that is being used for these Hive tests, it would be worth verifying if it is started with the default nat setting, which is Any, which will first try sending upnp /natpmp requests causing delays (whether they time out or not).

To avoid delays due to that, start nimbus with for example the option --nat:extip:127.0.0.1 in case of local machine testing.

jangko commented 3 years ago

progress in this area: 7 hours 34 minutes in my virtualbox. fail: 892, ok: 27.353, total: 28.245