tarantool / jepsen.tarantool

Jepsen tests for Tarantool
https://www.tarantool.io/en/
Other
7 stars 0 forks source link

Enable multi node testing #98

Open Totktonada opened 2 years ago

Totktonada commented 2 years ago

I tried to run multi node testing (jepsen-cluster and jepsen-cluster-txm workflows in tarantool) and found that it does not work.

There are a lot warnings of this kind:

WARN [2021-11-14 14:45:49,060] jepsen node 146.185.243.54 - jepsen.control Encountered error with conn [:control "146.185.243.54"]; reopening
java.lang.InterruptedException: sleep interrupted

That finally ends with:

CMake Error at cmake/atomic.cmake:46 (message):
  C atomics not supported

Which points me to https://github.com/tarantool/tarantool/issues/2088 and, it seems, means that those retries somehow lead to miss of the git submodule update --init --recursive command and/or incomplete cmake <...> commands.

The code that builds tarantool is the same for single node and multi node testing, so my guess is that it is a synchronization problem in the ssh connector implementation. There were relevant fixes in recent Jepsen versions, so we can try to update it and look, whether the problem will gone. See #30.

Full logs and artifacts:

Full logs from successful (single node) testing:

Tarantool's commit on which I run CI and got those logs.


As I see from https://github.com/tarantool/tarantool/issues/5736 multi node testing was not enabled to save machine resources. I think we should enable it anyway, maybe just run rarely. Otherwise we'll meet surprises like this one without understanding what actually occurs.