scalar-labs / scalar-jepsen

Jepsen tests for ScalarDB and ScalarDL
45 stars 6 forks source link

DL server failed to start due to long Cassandra download time #98

Closed yito88 closed 1 year ago

yito88 commented 1 year ago

What happened

The DL server bootstrap failed when the Cassandra download took a long time.

2023-03-23 10:39:36,254{GMT}    INFO    [jepsen node n1] jepsen.control.util: Downloading https://archive.apache.org/dist/cassandra/3.11.4/apache-cassandra-3.11.4-bin.tar.gz
...
2023-03-23 10:39:43,401{GMT}    INFO    [jepsen node n4] scalardl.core: n4 waiting for starting C* cluster
...
2023-03-23 10:42:43,402{GMT}    INFO    [jepsen node n4] scalardl.core: n4 starting DL server
...
2023-03-23 10:43:42,270{GMT}    INFO    [jepsen node n1] cassandra.core: n1 configuring Cassandra
2023-03-23 10:43:44,298{GMT}    INFO    [jepsen node n1] cassandra.core: n1 starting Cassandra
...

Cause

DL servers don't check if Cassandra is ready now, they wait for a fixed time.

Solution

DL servers should check Cassandra nodes' readiness