snabbco / snabb

Snabb: Simple and fast packet networking
Apache License 2.0
2.96k stars 298 forks source link

Snabb Lab update/expansion #921

Open lukego opened 8 years ago

lukego commented 8 years ago

Last week we reached a great milestone with the lab: we started running fully automatic benchmarking campaigns via the new Hydra CI that we are testing. See https://github.com/snabbco/snabb/issues/916#issuecomment-219067208 for details.

This is wonderful! It also creates a "good problem to have" because one CI test can now soak up all available hardware resources in the lab. On the one hand this is excellent because it puts all of the hardware to productive use. On the other hand it conflicts with interactive testing in a couple of ways: the lock command (#773) tends to block for about a minute when a test campaign is running and manual testing that does not respect locks can cause spurious results for CI.

I propose two steps to manage this situation:

  1. Divide servers between CI and interactive use. Specifically, reserve lugano-2 and lugano-4 for CI use and reserve the identical servers lugano-1 and lugano-3 for interactive use. This way the CI can run tests 24/7 and interactive users can still do manual tests and use lock and #lab to synchronize with each other.
  2. Add more servers. I have already provisioned 10 x EX41S-SSD servers at Hetzner called murren-[1..10].snabb.co. These are very similar to lugano servers except without NICs. This means we have massive capacity for running CI tests that can be adapted to work without special hardware. (This is a supplement to our own servers: testing with real network equipment is crucial as always.)

The initial impetus for the murren servers is to support testing a large software interoperability matrix e.g. checking functionality and performance for many combinations of snabbnfv + dpdk + qemu + linux. These tests are not so sensitive to networking hardware and so by running them on Hetzner we can both increase testing throughput and keep our own servers available for tests that do need their NICs.

Onward! Thanks @domenkozar for this amazing scalable lab based on NixOS! :grinning:

eugeneia commented 8 years ago

Great to see us evolve past the ad-hoc benchmark framework run by SnabbBot, it feels increasingly underpowered nowadays.

domenkozar commented 8 years ago

There's one unresolved question: where to run OpenStack tests. They require different kernel parameters what other tests will trip over. Maybe I could spend some time finding out is there is a common denominator that works for both, but probably not before end of the month.

domenkozar commented 8 years ago

For now those are ran on Grindelwald.