scalar-labs / scalar-jepsen

Jepsen tests for ScalarDB and ScalarDL
45 stars 6 forks source link

Cassandra node start failed #70

Open Jiao-05 opened 2 years ago

Jiao-05 commented 2 years ago

I want to test LWT in Cassandra by jespen

I run this Jepsen test on the virtual machine, and my configuration is Ubuntu 20 04,docker version is 20.10.12 and docker-compose version is 1.25.0.

At first, I ran the test and found that n1-n5 installing openjdk JRE always tried again until it failed.

So I modified the code a little like this

05

I replaced the content in the red box with the code on the figure below

04

Will my changes have an impact on error?

However, an error will be reported later: 03 This seems to be DB nodes start fail

How can I handle this error so that I can run tests?

yito88 commented 2 years ago

@Jiao-05 Sorry for the late reply. You may find the error in the Cassandra logs. The Cassandra log is stored in a DB node such as n1, not the control node.

Tsunaou commented 2 years ago

I want to test LWT in Cassandra by jespen

I run this Jepsen test on the virtual machine, and my configuration is Ubuntu 20 04,docker version is 20.10.12 and docker-compose version is 1.25.0.

At first, I ran the test and found that n1-n5 installing openjdk JRE always tried again until it failed.

So I modified the code a little like this

05

I replaced the content in the red box with the code on the figure below

04

Will my changes have an impact on error?

However, an error will be reported later: 03 This seems to be DB nodes start fail

How can I handle this error so that I can run tests?

Have you solved this problem? I replace it with open-11-jre and the same error reported

yito88 commented 2 years ago

@Tsunaou Thank you for your report. Could you share the error logs of the Cassandra node? system.log or debug.log should be at /root/cassandra/logs and you would find the error log of the bootstrap failure.

Tsunaou commented 2 years ago

@yito88 Oh I have partially solved this problem, see this commit log. However, the wait-ready method could sometime timeout and I don't know why it happens: node ni waiting nj for a long time. It will still take about 3 minutes to start a 5-node cluster if it finishes waiting successfully. It is strange.

yito88 commented 2 years ago

Thanks! Taking 3 minutes sounds normal. We have to start a Cassandra node one by one. The time starting a node takes 30 seconds or longer. When a Cassandra node has lots of commitlogs that should be persistent, the bootstrap time would be longer.