We should keep the connection until it dies. In case of several instances (of a replicaset), don't know: either current one or all. The latter seems better.
We should request leadership information using the existing connection / connection pool.
We should NOT renew a leader each time: instead, do it when the leader is gone.
If the instance becomes unavailable.
For write requests we can catch an error about the read-only state.
For read requests, which should be executed on a leader (if there are such cases), we should implement some server side check, whether the node that executes a request is a leader.
Re-create died connections or configure tarantool-java to overcome long unavailability, but still reconnect quite fast.
Since our tests are time bounded, creating a connection eats a time that can be used to give more dense workload to tarantool.
I didn't investigated the code much against this question, but I see the problem in the bank-lua test:
https://github.com/tarantool/jepsen.tarantool/blob/6165e6e5752e37c04e8a0e221412ac8151a48940/src/tarantool/bank.clj#L54-L64
(Consider
cl/open
.)Moreover, the
(db/primaries test)
call also creates its own connection and performs a request:https://github.com/tarantool/jepsen.tarantool/blob/6165e6e5752e37c04e8a0e221412ac8151a48940/src/tarantool/client.clj#L63-L68
How it should work, I think: