Open andrewgordstewart opened 4 years ago
My take on this is that 17f06b8
might have been an anomaly.
There appears to be a significant variation in the run time of the stress test on circle. When we ask circle to run a benchmark, we don't have control over the resources, and it's not clear what guarantees circle makes about the available resources. Perhaps, even though we pay for 8
cores, circle might not always let us fully utilize those 8 cores. One explanation for that phenomenon could be the use of spot instances to lower prices at the cost of variable execution times.
Some circumstantial evidence to back up this theory:
There is a huge variance in eg. the Spin up environment
step of the test
job:
When I ran the stress test locally (while plugged in) before and after reverting 17f06b87e332216d71a9ca2e903f3f07f658b484, I saw a consistent time of ~30s to run the stress test.
When I ran the stress test on @lalexgap ' AWS instance (when comparing a different number of worker threads) I observed repeatable times reported in the stress test.
One thing I have been thinking about since the web3torrent flickering tests is integrating some statistical analysis in our test suite. Two examples are:
(1) is much better resolved by ensuring that we have a consistent set of resources available to run the benchmark. In that case, a benchmark that runs for ~1 minute can generally be trusted to be accurate enough to have a hard cut-off to detect performance regression.
(2) is probably too narrow a problem to worth spending the time on, but I wonder if anyone does this strategy.
Regardless, what you say seems very plausible to me. Therefore for changes such as these we should be running the stress test (et al) on more predictable hardware (just a developer's laptop, say).
It is probably worth requesting a budget for something like an appropriate, dedicated, on-demand EC2 instance for running benchmarking jobs.
Performance of post-review commits is a bit puzzling:
17f06b8 2441ms
bbecf7d 1231ms
(the numbers for 25
makePayment
calls are very close to 25x the singlemakePayment
call in both cases)Originally posted by @geoknee in https://github.com/statechannels/statechannels/pull/2590#issuecomment-697271681