scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
56 stars 93 forks source link

Multiple Cassandra-stress got timeout of: java.lang.RuntimeException: Encountered exception creating schema #4827

Closed yarongilor closed 2 years ago

yarongilor commented 2 years ago

Installation details

Kernel Version: 5.13.0-1022-aws Scylla version (or git commit hash): 2022.1~rc5-20220515.6a1e89fbb with build-id 5cecadda59974548befb4305363bf374631fc3e1 Cluster size: 4 nodes (i3.4xlarge)

Scylla Nodes used in this run:

OS / Image: ami-02032eaf873ada3b0 (aws: eu-west-1)

Test: ics-longevity-1tb-7days-test Test id: 4e0f1b66-1c2c-442c-824e-ed207a79d965 Test name: enterprise-2022.1/SCT_Enterprise_Features/ICS/ics-longevity-1tb-7days-test Test config file(s):

Issue description

>>>>>>>

  1. Run c-s prepare step, creating one keyspace.
  2. Start the stress step with 4 keyspaces and round robin.
  3. c-s got timeout exception creating schema.

if it is a valid scylla failure then test yaml could be changed to create these schemas in prepare step as in the non-ics test. test yaml has:

prepare_write_cmd: ["cassandra-stress write cl=QUORUM n=1100200300 -schema 'replication(factor=3) compaction(strategy=IncrementalCompactionStrategy)' -port jmx=6868 -mode cql3 native  -rate threads=1000 -col 'size=FIXED(200) n=FIXED(5)' -pop seq=1..1100200300"]

stress_cmd: ["cassandra-stress mixed         cl=QUORUM duration=10080m -schema 'replication(factor=3)                               compaction(strategy=IncrementalCompactionStrategy)'    -port jmx=6868 -mode cql3 native  -rate threads=20 -pop seq=1..1100200300  -log interval=5 -col 'size=FIXED(200) n=FIXED(5)'",
             "cassandra-stress write         cl=QUORUM duration=10010m -schema 'replication(factor=3) compression=LZ4Compressor     compaction(strategy=IncrementalCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=lz4                   -rate threads=50 -pop seq=1..50000000    -log interval=5",
             "cassandra-stress write         cl=QUORUM duration=10020m -schema 'replication(factor=3) compression=SnappyCompressor  compaction(strategy=IncrementalCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=snappy                -rate threads=50 -pop seq=1..50000000    -log interval=5",
             "cassandra-stress write         cl=QUORUM duration=10030m -schema 'replication(factor=3) compression=DeflateCompressor compaction(strategy=IncrementalCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=none                  -rate threads=50 -pop seq=1..50000000    -log interval=5",
             "cassandra-stress user profile=/tmp/cs_mv_profile.yaml ops'(insert=3,read1=1,read2=1,read3=1)' cl=QUORUM duration=10080m                                                  -port jmx=6868 -mode cql3 native                                   -rate threads=10"]

stress_read_cmd: ["cassandra-stress read cl=QUORUM duration=10080m                                                                                                                 -port jmx=6868 -mode cql3 native  -rate threads=10 -pop seq=1..1100200300  -log interval=5 -col 'size=FIXED(200) n=FIXED(5)'",
                  "cassandra-stress read cl=QUORUM duration=10010m -schema 'replication(factor=3) compression=LZ4Compressor compaction(strategy=IncrementalCompactionStrategy)'     -port jmx=6868 -mode cql3 native compression=lz4                   -rate threads=20 -pop seq=1..50000000    -log interval=5",
                  "cassandra-stress read cl=QUORUM duration=10020m -schema 'replication(factor=3) compression=SnappyCompressor compaction(strategy=IncrementalCompactionStrategy)'  -port jmx=6868 -mode cql3 native compression=snappy                -rate threads=20 -pop seq=1..50000000    -log interval=5",
                  "cassandra-stress read cl=QUORUM duration=10030m -schema 'replication(factor=3) compression=DeflateCompressor compaction(strategy=IncrementalCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=none                  -rate threads=20 -pop seq=1..50000000    -log interval=5"]

error events:

2022-05-30 09:50:27.944: (CassandraStressEvent Severity.CRITICAL) period_type=end event_id=c0276e09-ac9b-407c-808a-3459e6670b0e duration=26s: node=Node longevity-tls-1tb-7d-2022-1-loader-node-4e0f1b66-4 [54.246.35.174 | 10.0.0.158] (seed: False)
stress_cmd=cassandra-stress write         cl=QUORUM duration=10020m -schema 'replication(factor=3) compression=SnappyCompressor  compaction(strategy=IncrementalCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=snappy                -rate threads=50 -pop seq=1..50000000    -log interval=5
errors:
Stress command completed with bad status 1: java.lang.RuntimeException: Encountered exception creating schema
at org.apache.cassandra.stress.se

2022-05-30 09:50:30.919: (CassandraStressEvent Severity.CRITICAL) period_type=end event_id=a1f3ed1b-1cef-4f35-991d-0b03f469ffad duration=25s: node=Node longevity-tls-1tb-7d-2022-1-loader-node-4e0f1b66-1 [3.248.231.57 | 10.0.2.6] (seed: False)
stress_cmd=cassandra-stress write         cl=QUORUM duration=10030m -schema 'replication(factor=3) compression=DeflateCompressor compaction(strategy=IncrementalCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=none                  -rate threads=50 -pop seq=1..50000000    -log interval=5
errors:
Stress command completed with bad status 1: java.lang.RuntimeException: Encountered exception creating schema
at org.apache.cassandra.stress.se

Screenshot from 2022-05-30 19-30-21

log error:

< t:2022-05-30 09:50:20,412 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpaces(SettingsSchema.java:69)
< t:2022-05-30 09:50:20,439 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:43)
< t:2022-05-30 09:50:20,912 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.settings.StressSettings.maybeCreateKeyspaces(StressSettings.java:228)
< t:2022-05-30 09:50:20,940 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:25)
< t:2022-05-30 09:50:21,413 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.StressAction.run(StressAction.java:58)
< t:2022-05-30 09:50:21,440 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
< t:2022-05-30 09:50:21,913 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.Stress.run(Stress.java:143)
< t:2022-05-30 09:50:21,940 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
< t:2022-05-30 09:50:22,008 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > READ,          61692,    3000,    3000,    3000,     5.1,     4.5,     8.8,    13.9,    46.4,    80.0,   20.0,  0.01980,      0,      0,       0,       0,       0,       0
< t:2022-05-30 09:50:22,009 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > WRITE,         61792,    2991,    2991,    2991,     1.5,     1.4,     2.5,     3.1,    31.0,    62.2,   20.0,  0.01980,      0,      0,       0,       0,       0,       0
< t:2022-05-30 09:50:22,010 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > total,        123484,    5991,    5991,    5991,     3.3,     2.5,     7.4,    12.6,    39.8,    80.0,   20.0,  0.01980,      0,      0,       0,       0,       0,       0
< t:2022-05-30 09:50:22,018 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > Created keyspaces. Sleeping 1s for propagation.
< t:2022-05-30 09:50:22,019 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > WARN  09:50:22,010 No schema agreement from live replicas after 10 s. The schema may not be up to date on some nodes.
< t:2022-05-30 09:50:22,413 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.Stress.main(Stress.java:62)
< t:2022-05-30 09:50:22,440 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
< t:2022-05-30 09:50:22,913 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [/10.0.3.84:9042] Timed out waiting for server response
< t:2022-05-30 09:50:22,941 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.util.JavaDriverClient.execute(JavaDriverClient.java:190)
< t:2022-05-30 09:50:23,019 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > Sleeping 2s...
< t:2022-05-30 09:50:23,414 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:43)
< t:2022-05-30 09:50:23,441 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesNative(SettingsSchema.java:84)
< t:2022-05-30 09:50:23,914 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:25)
< t:2022-05-30 09:50:23,941 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         ... 5 more
< t:2022-05-30 09:50:24,414 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
< t:2022-05-30 09:50:24,442 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [/10.0.3.84:9042] Timed out waiting for server response
< t:2022-05-30 09:50:24,915 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
< t:2022-05-30 09:50:24,942 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:979)
< t:2022-05-30 09:50:25,028 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > Warming up WRITE with 0 iterations...
< t:2022-05-30 09:50:25,415 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
< t:2022-05-30 09:50:25,442 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1631)
< t:2022-05-30 09:50:25,529 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > Failed to connect over JMX; not collecting these stats
< t:2022-05-30 09:50:25,915 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.util.JavaDriverClient.execute(JavaDriverClient.java:190)
< t:2022-05-30 09:50:25,942 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663)
< t:2022-05-30 09:50:26,416 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesNative(SettingsSchema.java:84)
< t:2022-05-30 09:50:26,443 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738)
< t:2022-05-30 09:50:26,916 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         ... 5 more
< t:2022-05-30 09:50:26,943 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466)
< t:2022-05-30 09:50:27,004 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > READ,          77839,    3229,    3229,    3229,     4.8,     4.5,     8.1,    11.1,    14.4,    16.3,   25.0,  0.01593,      0,      0,       0,       0,       0,       0
< t:2022-05-30 09:50:27,004 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > WRITE,         77083,    3058,    3058,    3058,     1.4,     1.4,     2.4,     2.8,     3.5,     5.2,   25.0,  0.01593,      0,      0,       0,       0,       0,       0
< t:2022-05-30 09:50:27,005 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > total,        154922,    6288,    6288,    6288,     3.2,     2.6,     7.0,    10.0,    13.6,    16.3,   25.0,  0.01593,      0,      0,       0,       0,       0,       0
< t:2022-05-30 09:50:27,416 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [/10.0.3.84:9042] Timed out waiting for server response
< t:2022-05-30 09:50:27,443 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
< t:2022-05-30 09:50:27,917 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:979)
< t:2022-05-30 09:50:27,943 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at java.base/java.lang.Thread.run(Thread.java:834)
< t:2022-05-30 09:50:27,944 f:base.py         l:146  c:RemoteLibSSH2CmdRunner p:ERROR > Error executing command: "echo TAG: loader_idx:0-cpu_idx:0-keyspace_idx:1; STRESS_TEST_MARKER=Z0UKOH63KKO9B9U4L7PE; cassandra-stress write         cl=QUORUM duration=10020m -schema keyspace=keyspace_snappy 'replication(factor=3) compression=SnappyCompressor  compaction(strategy=IncrementalCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=snappy                 user=cassandra password=cassandra -rate threads=50 -pop seq=1..50000000    -log interval=5 -transport "truststore=/etc/scylla/ssl_conf/client/cacerts.jks truststore-password=cassandra" -node 10.0.3.84 -errors skip-unsupported-columns"; Exit status: 1
< t:2022-05-30 09:50:27,944 f:base.py         l:148  c:RemoteLibSSH2CmdRunner p:DEBUG > STDOUT: tion null, max connections per host 8
< t:2022-05-30 09:50:27,944 f:base.py         l:148  c:RemoteLibSSH2CmdRunner p:DEBUG > Datatacenter: eu-west; Host: /10.0.1.48; Rack: 1a
< t:2022-05-30 09:50:27,944 f:base.py         l:148  c:RemoteLibSSH2CmdRunner p:DEBUG > Datatacenter: eu-west; Host: /10.0.3.84; Rack: 1a
< t:2022-05-30 09:50:27,944 f:base.py         l:148  c:RemoteLibSSH2CmdRunner p:DEBUG > Datatacenter: eu-west; Host: /10.0.3.180; Rack: 1a
< t:2022-05-30 09:50:27,944 f:base.py         l:148  c:RemoteLibSSH2CmdRunner p:DEBUG > Datatacenter: eu-west; Host: /10.0.1.248; Rack: 1a
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG > STDERR: java.lang.RuntimeException: Encountered exception creating schema
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesNative(SettingsSchema.java:100)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpaces(SettingsSchema.java:69)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.settings.StressSettings.maybeCreateKeyspaces(StressSettings.java:228)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.StressAction.run(StressAction.java:58)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.Stress.run(Stress.java:143)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.Stress.main(Stress.java:62)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG > Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [/10.0.3.84:9042] Timed out waiting for server response
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:43)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:25)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.util.JavaDriverClient.execute(JavaDriverClient.java:190)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesNative(SettingsSchema.java:84)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         ... 5 more
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG > Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [/10.0.3.84:9042] Timed out waiting for server response
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:979)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1631)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
< t:2022-05-30 09:50:27,944 f:base.py         l:150  c:RemoteLibSSH2CmdRunner p:DEBUG >         at java.base/java.lang.Thread.run(Thread.java:834)
< t:2022-05-30 09:50:28,417 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1631)
< t:2022-05-30 09:50:28,917 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663)
< t:2022-05-30 09:50:29,417 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738)
< t:2022-05-30 09:50:29,506 f:base.py         l:140  c:RemoteLibSSH2CmdRunner p:DEBUG > STDERR: Failed to connect over JMX; not collecting these stats
< t:2022-05-30 09:50:29,506 f:base.py         l:140  c:RemoteLibSSH2CmdRunner p:DEBUG > Failed to connect over JMX; not collecting these stats
< t:2022-05-30 09:50:29,506 f:base.py         l:140  c:RemoteLibSSH2CmdRunner p:DEBUG > Failed to connect over JMX; not collecting these stats
< t:2022-05-30 09:50:29,506 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "echo TAG: loader_idx:0-cpu_idx:0-keyspace_idx:1; STRESS_TEST_MARKER=WWYH7H4TE4SU34CUIW5Q; cassandra-stress mixed         cl=QUORUM duration=10080m -schema keyspace=keyspace1 'replication(factor=3)                               compaction(strategy=IncrementalCompactionStrategy)'    -port jmx=6868 -mode cql3 native   user=cassandra password=cassandra -rate threads=20 -pop seq=1..1100200300  -log interval=5 -col 'size=FIXED(200) n=FIXED(5)' -transport "truststore=/etc/scylla/ssl_conf/client/cacerts.jks truststore-password=cassandra" -node 10.0.3.84 -errors skip-unsupported-columns" finished with status 0
< t:2022-05-30 09:50:29,519 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "echo TAG: loader_idx:0-cpu_idx:0-keyspace_idx:1; STRESS_TEST_MARKER=CY9DUEJDSXJJK9LTUGND; cassandra-stress user profile=/tmp/cs_mv_profile.yaml ops'(insert=3,read1=1,read2=1,read3=1)' cl=QUORUM duration=10080m                                                  -port jmx=6868 -mode cql3 native                                    user=cassandra password=cassandra -rate threads=10 -transport "truststore=/etc/scylla/ssl_conf/client/cacerts.jks truststore-password=cassandra" -node 10.0.3.84 -errors skip-unsupported-columns" finished with status 0
< t:2022-05-30 09:50:29,530 f:base.py         l:140  c:RemoteLibSSH2CmdRunner p:DEBUG > STDERR: WARN  09:50:22,010 No schema agreement from live replicas after 10 s. The schema may not be up to date on some nodes.
< t:2022-05-30 09:50:29,530 f:base.py         l:140  c:RemoteLibSSH2CmdRunner p:DEBUG > Failed to connect over JMX; not collecting these stats
< t:2022-05-30 09:50:29,530 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "echo TAG: loader_idx:0-cpu_idx:0-keyspace_idx:1; STRESS_TEST_MARKER=VKPOWIK7DG9TPILH74OO; cassandra-stress write         cl=QUORUM duration=10010m -schema keyspace=keyspace_lz4 'replication(factor=3) compression=LZ4Compressor     compaction(strategy=IncrementalCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=lz4                    user=cassandra password=cassandra -rate threads=50 -pop seq=1..50000000    -log interval=5 -transport "truststore=/etc/scylla/ssl_conf/client/cacerts.jks truststore-password=cassandra" -node 10.0.3.84 -errors skip-unsupported-columns" finished with status 0
< t:2022-05-30 09:50:29,918 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG >         at com.datastax.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466)

<<<<<<<

Logs:

Jenkins job URL

fruch commented 2 years ago

looks very similar to: https://github.com/scylladb/scylla/issues/10692

yarongilor commented 2 years ago

opened https://github.com/scylladb/scylla-enterprise/issues/2273

roydahan commented 2 years ago

Could it be the same as https://github.com/scylladb/scylla/issues/9906 ?

@yarongilor please run the same reproducer with latest 2021.1.x to check if it's a regression or not.