scylladb / scylla-tools-java

Apache Cassandra, supplying tools for Scylla
Apache License 2.0
53 stars 85 forks source link

cassandra-stress: Unrecognized strategy option {initial_tablets} passed to org.apache.cassandra.locator.NetworkTopologyStrategy #377

Closed bhalevy closed 9 months ago

bhalevy commented 9 months ago

When testing e.g. bootstrap_test.py::TestBootstrap::test_shutdown_wiped_node_may_join The cassandra-stress tool returns an error since it doesn't recognize the initial_tablets option.

For example:

22:42:05,651 3898150 dtest_setup                    DEBUG    dtest_setup.py      :61   | test_shutdown_wiped_node_may_join: Allocated cluster ID 3: /home/bhalevy/.dtest/dtest-vnn1ik4o
22:42:05,711 3898150 ccm                            DEBUG    cluster.py          :762  | test_shutdown_wiped_node_may_join: start_nodes: no_wait=False wait_for_binary_proto=True wait_other_notice=True wait_normal_token_owner=True force_wait_for_cluster_start=True
22:42:05,714 3898150 ccm                            DEBUG    cluster.py          :762  | test_shutdown_wiped_node_may_join: node1: Starting scylla: args=['/home/bhalevy/.dtest/dtest-vnn1ik4o/test/node1/bin/scylla', '--options-file', '/home/bhalevy/.dtest/dtest-vnn1ik4o/test/node1/c
onf/scylla.yaml', '--log-to-stdout', '1', '--api-address', '127.0.3.1', '--smp', '2', '--memory', '1024M', '--developer-mode', 'true', '--default-log-level', 'info', '--overprovisioned', '--prometheus-address', '127.0.3.1', '--unsafe-bypass-fsync', '1', '--kernel-page-cache', '1', 
'--commitlog-use-o-dsync', '0', '--max-networking-io-control-blocks', '1000'] wait_other_notice=True wait_for_binary_proto=True
22:42:06,048 3898150 ccm                            DEBUG    cluster.py          :762  | test_shutdown_wiped_node_may_join: node1: Starting scylla-jmx: args=['/home/bhalevy/.dtest/dtest-vnn1ik4o/test/node1/bin/symlinks/scylla-jmx', '-Dapiaddress=127.0.3.1', '-Djavax.management.buil
der.initial=com.scylladb.jmx.utils.APIBuilder', '-Djava.rmi.server.hostname=127.0.3.1', '-Dcom.sun.management.jmxremote', '-Dcom.sun.management.jmxremote.host=127.0.3.1', '-Dcom.sun.management.jmxremote.port=7199', '-Dcom.sun.management.jmxremote.rmi.port=7199', '-Dcom.sun.manageme
nt.jmxremote.local.only=false', '-Xmx256m', '-XX:+UseSerialGC', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', '-jar', '/home/bhalevy/.dtest/dtest-vnn1ik4o/test/node1/bin/scylla-jmx-1.0.jar']
22:42:06,356 3898150 ccm                            DEBUG    cluster.py          :762  | test_shutdown_wiped_node_may_join: node2: Starting scylla: args=['/home/bhalevy/.dtest/dtest-vnn1ik4o/test/node2/bin/scylla', '--options-file', '/home/bhalevy/.dtest/dtest-vnn1ik4o/test/node2/c
onf/scylla.yaml', '--log-to-stdout', '1', '--api-address', '127.0.3.2', '--smp', '2', '--memory', '1024M', '--developer-mode', 'true', '--default-log-level', 'info', '--overprovisioned', '--prometheus-address', '127.0.3.2', '--unsafe-bypass-fsync', '1', '--kernel-page-cache', '1', 
'--commitlog-use-o-dsync', '0', '--max-networking-io-control-blocks', '1000'] wait_other_notice=True wait_for_binary_proto=True
22:42:10,861 3898150 ccm                            DEBUG    cluster.py          :762  | test_shutdown_wiped_node_may_join: node2: Starting scylla-jmx: args=['/home/bhalevy/.dtest/dtest-vnn1ik4o/test/node2/bin/symlinks/scylla-jmx', '-Dapiaddress=127.0.3.2', '-Djavax.management.buil
der.initial=com.scylladb.jmx.utils.APIBuilder', '-Djava.rmi.server.hostname=127.0.3.2', '-Dcom.sun.management.jmxremote', '-Dcom.sun.management.jmxremote.host=127.0.3.2', '-Dcom.sun.management.jmxremote.port=7199', '-Dcom.sun.management.jmxremote.rmi.port=7199', '-Dcom.sun.manageme
nt.jmxremote.local.only=false', '-Xmx256m', '-XX:+UseSerialGC', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', '-jar', '/home/bhalevy/.dtest/dtest-vnn1ik4o/test/node2/bin/scylla-jmx-1.0.jar']
22:42:11,175 3898150 ccm                            DEBUG    cluster.py          :762  | test_shutdown_wiped_node_may_join: node1: tablets enabled, adjusting stress options by replication strategy change.
22:42:12,444 3898150 errors                         ERROR    conftest.py         :202  | test_shutdown_wiped_node_may_join: test failed: 
self = <bootstrap_test.TestBootstrap object at 0x7f0d8bbf40d0>

    def test_shutdown_wiped_node_may_join(self):
>       self._wiped_node_may_join_test(gently=True)

bootstrap_test.py:437: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
bootstrap_test.py:458: in _wiped_node_may_join_test
    node1.stress(["write", "n=10000", "-rate", "threads=8"])
env/lib/python3.11/site-packages/ccmlib/node.py:1345: in stress
    return handle_external_tool_process(p, ['stress'] + stress_options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

process = <Popen: returncode: 1 args: ['/home/bhalevy/dev/scylla/resources/cassandra/t...>
cmd_args = ['stress', 'write', 'n=10000', '-rate', 'threads=8']

    def handle_external_tool_process(process, cmd_args):
        out, err = process.communicate()
        if (out is not None) and isinstance(out, bytes):
            out = out.decode()
        if (err is not None) and isinstance(err, bytes):
            err = err.decode()
        rc = process.returncode

        if rc != 0:
>           raise ToolError(cmd_args, rc, out, err)
E           ccmlib.node.ToolError: Subprocess ['stress', 'write', 'n=10000', '-rate', 'threads=8'] exited with non-zero status; exit status: 1; 
E           stdout: ******************** Stress Settings ********************
E           Command:
E             Type: write
E             Count: 10,000
E             No Warmup: false
E             Consistency Level: LOCAL_ONE
E             Serial Consistency Level: SERIAL
E             Target Uncertainty: not applicable
E             Key Size (bytes): 10
E             Counter Increment Distibution: add=fixed(1)
E           Rate:
E             Auto: false
E             Thread Count: 8
E             OpsPer Sec: 0
E           Population:
E             Sequence: 1..10000
E             Order: ARBITRARY
E             Wrap: true
E           Insert:
E             Revisits: Uniform:  min=1,max=1000000
E             Visits: Fixed:  key=1
E             Row Population Ratio: Ratio: divisor=1.000000;delegate=Fixed:  key=1
E             Batch Type: not batching
E           Columns:
E             Max Columns Per Key: 5
E             Column Names: [C0, C1, C2, C3, C4]
E             Comparator: AsciiType
E             Timestamp: null
E             Variable Column Count: false
E             Slice: false
E             Size Distribution: Fixed:  key=34
E             Count Distribution: Fixed:  key=5
E           Errors:
E             Ignore: false
E             Tries: 10
E           Log:
E             No Summary: false
E             No Settings: false
E             File: null
E             Interval Millis: 1000
E             Level: NORMAL
E           Mode:
E             API: JAVA_DRIVER_NATIVE
E             Connection Style: CQL_PREPARED
E             CQL Version: CQL3
E             Protocol Version: V4
E             Username: null
E             Password: null
E             Auth Provide Class: null
E             Max Pending Per Connection: null
E             Connections Per Host: 8
E             Compression: NONE
E           Node:
E             Nodes: [127.0.3.1]
E             Is White List: false
E             Datacenter: null
E             Rack: null
E           Schema:
E             Keyspace: keyspace1
E             Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
E             Replication Strategy Options: {datacenter1=1, initial_tablets=128}
E             Storage Options: {}
E             Table Compression: null
E             Table Compaction Strategy: null
E             Table Compaction Strategy Options: {}
E           Transport:
E             factory=org.apache.cassandra.thrift.TFramedTransportFactory; truststore=null; truststore-password=null; keystore=null; keystore-password=null; ssl-protocol=TLS; ssl-alg=SunX509; store-type=JKS; ssl-ciphers=TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA; 
E           Port:
E             Native Port: 9042
E             Thrift Port: 9160
E             JMX Port: 7199
E           Send To Daemon:
E             *not set*
E           Graph:
E             File: null
E             Revision: unknown
E             Title: null
E             Operation: WRITE
E           TokenRange:
E             Wrap: false
E             Split Factor: 1
E           CloudConf:
E             File: null
E           
E           ===== Using optimized driver!!! =====
E           Connected to cluster: test, max pending requests per connection null, max connections per host 8
E           Datatacenter: datacenter1; Host: /127.0.3.1; Rack: rack1
E           Datatacenter: datacenter1; Host: /127.0.3.2; Rack: rack1
E           ; 
E           stderr: java.lang.RuntimeException: Encountered exception creating schema
E               at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesNative(SettingsSchema.java:105)
E               at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpaces(SettingsSchema.java:74)
E               at org.apache.cassandra.stress.settings.StressSettings.maybeCreateKeyspaces(StressSettings.java:230)
E               at org.apache.cassandra.stress.StressAction.run(StressAction.java:58)
E               at org.apache.cassandra.stress.Stress.run(Stress.java:143)
E               at org.apache.cassandra.stress.Stress.main(Stress.java:62)
E           Caused by: com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException: Unrecognized strategy option {initial_tablets} passed to org.apache.cassandra.locator.NetworkTopologyStrategy for keyspace keyspace1
E               at com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException.copy(InvalidConfigurationInQueryException.java:38)
E               at com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException.copy(InvalidConfigurationInQueryException.java:27)
E               at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
E               at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
E               at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
E               at org.apache.cassandra.stress.util.JavaDriverClient.execute(JavaDriverClient.java:215)
E               at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesNative(SettingsSchema.java:89)
E               ... 5 more
env/lib/python3.11/site-packages/ccmlib/node.py:2170: ToolError
22:42:12,444 3898150 errors                         ERROR    conftest.py         :202  | test_shutdown_wiped_node_may_join: test failed: 
self = <bootstrap_test.TestBootstrap object at 0x7f0d8bbf40d0>

    def test_shutdown_wiped_node_may_join(self):
>       self._wiped_node_may_join_test(gently=True)
bhalevy commented 9 months ago

Cc @xemul

bhalevy commented 9 months ago

@mykaul / @dani-tweig can you please add a tablets label like the one in the scylladb repo?

xemul commented 9 months ago

With fresh enough scylla-ccm it passes

$ pytest --tablets  -s --cassandra-dir=$SCYLLA_DIR 'bootstrap_test.py::TestBootstrap::test_shutdown_wiped_node_may_join'
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.11.6, pytest-7.4.0, pluggy-1.0.0
Using --randomly-seed=2982088198
rootdir: /home/xemul/src/scylla-dtest
configfile: pytest.ini
plugins: randomly-3.13.0, metadata-3.0.0, timeout-2.1.0, subtests-0.11.0, xdist-3.3.1, elk-reporter-0.2.1, repeat-0.9.1, html-4.1.1, asyncio-0.20.3
timeout: 7200s
timeout method: signal
timeout func_only: False
asyncio: mode=Mode.STRICT
collected 1 item                                                                                                                                                                                                                             

bootstrap_test.py::TestBootstrap::test_shutdown_wiped_node_may_join PASSED

============================================================================================================= 1 passed in 44.68s =============================================================================================================

"Fresh enough" should include scylladb/scylla-ccm@6209d9d4421f84aa8f1bfeb79e2e38566424d5cb commit

xemul commented 9 months ago

For some reason recent dtest re-run didn't include this test: https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/119/consoleText it's not mentioned in logs at all

xemul commented 9 months ago

Only

[2024-01-26T15:11:05.717Z] bootstrap_test.py::TestBootstrap::test_start_stop_node 
[2024-01-26T15:11:05.717Z] bootstrap_test.py::TestBootstrap::test_local_quorum_bootstrap 
[2024-01-26T15:11:47.940Z] bootstrap_test.py::TestBootstrap::test_add_node 
[2024-01-26T15:11:51.236Z] bootstrap_test.py::TestBootstrap::test_add_detached_node 
[2024-01-26T15:13:41.812Z] bootstrap_test.py::TestBootstrap::test_start_stop 

were run

bhalevy commented 9 months ago

With fresh enough scylla-ccm it passes

Cool, I'll verify locally and close

bhalevy commented 9 months ago

verified