scylladb / scylla-ccm

Cassandra Cluster Manager, modified for Scylla
Apache License 2.0
20 stars 64 forks source link

self.watch_rest_for_alive(other_node) fails en-mass during bootstrap in debug mode #477

Open bhalevy opened 1 year ago

bhalevy commented 1 year ago

See for example https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/245/testReport/rebuild_test/TestRebuild/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split094___test_rebuild_many_keyspaces/

self = <rebuild_test.TestRebuild object at 0x7f1b7ed0bf50>

    def test_rebuild_many_keyspaces(self):
        """
        Test rebuilding many keyspaces in same dc works as expected.
        """

        num_keys = 100
        num_keyspaces = 50
        num_tables = 2

        cluster = self.cluster
        cluster.set_configuration_options(values={'endpoint_snitch': 'GossipingPropertyFileSnitch'})
        node1 = self.add_node(1)

        # start node in dc1
        node1.start(wait_for_binary_proto=True)

        # populate data in dc1
        session = self.patient_exclusive_cql_connection(node1)
        dc = 'dc1'
        logger.debug(f"Creating {num_keyspaces} keyspaces")
        keyspaces = ['ks_{:04d}'.format(i) for i in range(0, num_keyspaces)]
        for ks in keyspaces:
            create_ks(session, ks, {dc: 1})

        logger.debug(f"Creating {num_tables} table(s) in each ks")
        tables = ['cf_{:04d}'.format(i) for i in range(0, num_tables)]
        for ks in keyspaces:
            for cf in tables:
                create_c1c2_table(session, cf=f'{ks}.{cf}', debug_query=False)
                insert_c1c2(session, n=num_keys, ks=ks, cf=cf, consistency=ConsistencyLevel.ALL)

        keys = [i for i in range(0, num_keys)]
        self._check_data(session, keyspaces, tables, keys)
        session.shutdown()

        logger.debug("Bootstrapping node2 with {auto_bootstrap: false}")
        node2 = self.add_node(2)
>       node2.start(wait_other_notice=True, wait_for_binary_proto=True)

rebuild_test.py:315: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:669: in start
    scylla_process = self._start_scylla(args, marks, update_pid,
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:313: in _start_scylla
    node.watch_rest_for_alive(self, timeout=t)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ccmlib.scylla_node.ScyllaNode object at 0x7f1b7d2c3410>
nodes = <ccmlib.scylla_node.ScyllaNode object at 0x7f1b7d3b7b10>, timeout = 360

    def watch_rest_for_alive(self, nodes, timeout=120):
        """
        Use the REST API to wait until this node detects that the nodes listed
        in "nodes" become fully operational and knows of its tokens.
        This is similar to watch_log_for_alive but uses ScyllaDB's REST API
        instead of the log file and waits for the node to be really useable,
        not just "UP" (see issue #461)
        """
        tofind = nodes if isinstance(nodes, list) else [nodes]
        tofind = set([node.address() for node in tofind])
        url_live = f"http://{self.address()}:10000/gossiper/endpoint/live"
        url_joining = f"http://{self.address()}:10000/storage_service/nodes/joining"
        url_tokens = f"http://{self.address()}:10000/storage_service/tokens/"
        endtime = time.time() + timeout
        while time.time() < endtime:
            live = set()
            response = requests.get(url=url_live)
            if response.text:
                live = set(response.json())
            response = requests.get(url=url_joining)
            if response.text:
                live = live - set(response.json())
            # Verify that node knows not only about the existance of the
            # other node, but also its tokens:
            if tofind.issubset(live):
                # This node thinks that all given nodes are alive and not
                # "joining", we're almost done, but still need to verify
                # that the node knows the others' tokens.
                check = tofind
                tofind = set()
                for n in check:
                    response = requests.get(url=url_tokens+n)
                    if response.text == '[]':
                        tofind.add(n)
            if not tofind:
                return
            time.sleep(0.1)
>       raise TimeoutError(f"watch_rest_for_alive() timeout after {timeout} seconds")
E       ccmlib.node.TimeoutError: watch_rest_for_alive() timeout after 360 seconds

../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1379: TimeoutError

It appears like it simply takes more than 360 seconds for the node to bootstrap in debug mode so watch_rest_for_alive fails.

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/245/artifact/logs-full.debug.094/1690100633894_rebuild_test.py%3A%3ATestRebuild%3A%3Atest_rebuild_many_keyspaces/node1.log

INFO  2023-07-23 08:17:27,230 [shard 0] gossip - InetAddress 127.0.10.2 is now UP, status = UNKNOWN
INFO  2023-07-23 08:17:27,372 [shard 0] compaction - [Compact system.scylla_local 5c4bec80-2931-11ee-8451-d2e563b95f2c] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n13_1gaps20fttl8pke33g-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0mxm_2ela820fttl8pke33g-big-Data.db:level=0:origin=compaction]
INFO  2023-07-23 08:17:27,435 [shard 0] compaction - [Compact system.scylla_local 5c4bec80-2931-11ee-8451-d2e563b95f2c] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n13_28sy820fttl8pke33g-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 47ms = 254kB/s. ~256 total partitions merged to 1.
INFO  2023-07-23 08:17:41,278 [shard 0] compaction - [Compact system.scylla_local 6495cfa0-2931-11ee-8451-d2e563b95f2c] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n1h_1cnjk20fttl8pke33g-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n13_28sy820fttl8pke33g-big-Data.db:level=0:origin=compaction]
INFO  2023-07-23 08:17:41,337 [shard 0] compaction - [Compact system.scylla_local 6495cfa0-2931-11ee-8451-d2e563b95f2c] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n1h_1ovcw20fttl8pke33g-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 43ms = 278kB/s. ~256 total partitions merged to 1.
INFO  2023-07-23 08:18:02,203 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Added node=127.0.10.2 as bootstrap, coordinator=127.0.10.2
INFO  2023-07-23 08:18:02,445 [shard 0] cdc - Starting to use generation (2023/07/23 08:18:27, cfbbe7d7-7806-4e2f-b6fc-8755390d4673)
INFO  2023-07-23 08:18:02,469 [shard 0] compaction - [Compact system.cdc_local 71332e60-2931-11ee-8451-d2e563b95f2c] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g7w_0n22_22swg20fttl8pke33g-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g7w_0mxn_1zdg020fttl8pke33g-big-Data.db:level=0:origin=memtable]
INFO  2023-07-23 08:18:02,631 [shard 0] cdc - CDC description table successfully updated with generation (2023/07/23 08:18:27, cfbbe7d7-7806-4e2f-b6fc-8755390d4673).
INFO  2023-07-23 08:18:02,912 [shard 0] compaction - [Compact system.cdc_local 71332e60-2931-11ee-8451-d2e563b95f2c] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g7w_0n22_3dyfk20fttl8pke33g-big-Data.db:level=0]. 68kB to 63kB (~92% of original) in 419ms = 163kB/s. ~256 total partitions merged to 2.
INFO  2023-07-23 08:23:22,888 [shard 0] compaction_manager - Asked to stop

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/245/artifact/logs-full.debug.094/1690100633894_rebuild_test.py%3A%3ATestRebuild%3A%3Atest_rebuild_many_keyspaces/node2.log

INFO  2023-07-23 08:17:40,585 [shard 0] storage_service - entering JOINING mode
INFO  2023-07-23 08:17:40,639 [shard 1] compaction - [Compact system.scylla_local 642affe0-2931-11ee-8618-2a2c7da362fd] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n1g_3gqqo21et193d2r75p-big-Data.db:level=0]. 10kB to 5kB (~53% of original) in 45ms = 234kB/s. ~256 total partitions merged to 1.
INFO  2023-07-23 08:17:40,728 [shard 0] storage_service - waiting for ring information
INFO  2023-07-23 08:17:40,728 [shard 0] storage_service - waiting for schema information to complete
INFO  2023-07-23 08:17:41,437 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:41,438 [shard 0] migration_manager - Schema merge with 127.0.10.1:0 completed
INFO  2023-07-23 08:17:41,460 [shard 0] compaction - [Compact system_schema.columns 62df4600-2931-11ee-a120-2a2d7da362fd] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-6-big-Data.db:level=0]. 53kB to 32kB (~59% of original) in 2931ms = 18kB/s. ~256 total partitions merged to 56.
INFO  2023-07-23 08:17:41,483 [shard 0] compaction - [Compact system.local 64b390d0-2931-11ee-a120-2a2d7da362fd] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g7w_0n1g_3io6o2g7jx5i3pll31-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/mc-6-big-Data.db:level=0:origin=compaction]
INFO  2023-07-23 08:17:41,853 [shard 0] compaction - [Compact system.local 64b390d0-2931-11ee-a120-2a2d7da362fd] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g7w_0n1h_34ycw2g7jx5i3pll31-big-Data.db:level=0]. 13kB to 7kB (~54% of original) in 248ms = 56kB/s. ~256 total partitions merged to 1.
INFO  2023-07-23 08:17:41,884 [shard 0] compaction - [Compact system.scylla_local 64ef8860-2931-11ee-a120-2a2d7da362fd] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n1g_44j802g7jx5i3pll31-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/md-3g7w_0n1g_39ge82g7jx5i3pll31-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/mc-2-big-Data.db:level=0:origin=memtable]
INFO  2023-07-23 08:17:42,178 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:42,283 [shard 0] compaction - [Compact system.scylla_local 64ef8860-2931-11ee-a120-2a2d7da362fd] Compacted 3 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n1h_5jbxc2g7jx5i3pll31-big-Data.db:level=0]. 17kB to 6kB (~36% of original) in 272ms = 64kB/s. ~384 total partitions merged to 1.
INFO  2023-07-23 08:17:42,296 [shard 0] compaction - [Compact system_schema.tables 65314c50-2931-11ee-a120-2a2d7da362fd] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09/mc-4-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09/mc-2-big-Data.db:level=0:origin=memtable]
INFO  2023-07-23 08:17:42,942 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:43,432 [shard 0] compaction - [Compact system_schema.tables 65314c50-2931-11ee-a120-2a2d7da362fd] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09/me-3g7w_0n1i_20g0w2g7jx5i3pll31-big-Data.db:level=0]. 51kB to 29kB (~57% of original) in 1013ms = 51kB/s. ~256 total partitions merged to 56.
INFO  2023-07-23 08:17:43,652 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:44,300 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:44,948 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:45,019 [shard 0] storage_service - Checking bootstrapping/leaving nodes: ok
INFO  2023-07-23 08:17:45,019 [shard 0] storage_service - getting bootstrap token
INFO  2023-07-23 08:17:45,020 [shard 0] boot_strapper - Get random bootstrap_tokens={-8395587159826219290, -5464666828924826936, -7222252513962729396, 135100649381944257, 6576852586953413498, 325464549287023086, 4904036563849780271, -6546981841352807480, 2865406470152825579, -394784590446060672, 8051423441471499559, 6844991086976765892, -5614594334687664535, 1086175589025211548, 2408945801985234454, 8122767265582843160, -2956082443385573348, -960961978041855340, -3643595389008655452, 6857527436554476253, 5707640721068606399, 8897833340820015336, 2703363383166069333, -6588602489652425955, 5440287864758530396, -3094282848608922741, -1833526704171643022, -5105986572299303535, 7343359728267118239, -1834910154028555994, 1359791063466712191, -6507608401952339437, 566681375887275487, -306335652492972568, -4759202512546421207, -3129379551594474653, -5245088751617943006, -1404467098100775533, 3621138246620388043, 1485318039863435327, -4634401387007861583, -3038957000594949223, -6067046826404507156, 259137294824191805, 1926200563648969364, -1129145445286767819, -4639143725143526709, -1270097303253184727, -4966536635347290250, -1836953348237170973, -5794517138929612802, -3357996511996118312, -3791611262883130211, -2712206311139191209, -6791804046333807375, -6857569926998170414, -1214712356026727843, 5648296861051792831, -3655125933126647862, -4437868813526158245, 5370436154027038373, 7792027546790392396, -4561037836572836580, 3279434197974400561, 15665818235702521, 1524968315694800053, -176568868099139895, -6701925414249034591, 5178105708499513041, 5804070608230706547, 5230568807411941866, 4853077499092200821, -7369163820979119575, -6952830276823514532, -103846967177016728, -992945990790605636, 8386798552877056049, 1133063589928776842, -5956267497982120165, -1738707564756902725, -2311239167820960648, -3617434983021206653, -8806310878743135339, 8682303417832312958, 6542373449263183026, 8453323091764003081, 6401264866713689372, -3510202054027109158, 717199908134802740, -909605829712997119, -5535683829237212567, 3940927772554800994, 7989711667997573788, 8150595733957275539, -2056771582667231152, -6866192218627519514, -7574389104777509529, 5949929264212497033, 6666375004548028554, -2043394039982026649, -6812742753429456488, -3465313958018044075, -7402991056989302396, -519923436200876788, -4296903967476984787, 3858349911860527952, 8003019013187970627, -3210942499279608571, 8134376040994854713, -5268721429312034183, -3255006945112380069, -1237903303615122379, 3508003286295929138, 5602756197637522861, -82889097934825496, 765186793610237170, -7218303157549388562, -8371313459689358554, -6077713351632690487, -1338841163598433145, -1461213882739586397, -8003371252654520955, -5723333243472021823, -3598399274862099320, -7975816588440379235, 3182154134710411372, 5982218720245695231, -8022333615083156526, -8013322529234294041, 6582953801098307917, -1746134476029010781, -7962438755842596771, 8440675877475389789, -5517615739684767939, -6728175856340492674, -7709261831346207023, -8218332298477205921, 5863034785872395197, 8549729515336153751, 8444318272321445644, 3406730916609539115, 5280359148735403361, 5662208716882257323, -5591151511066121729, -8883677973470414918, 3941620723706007740, -8502421991628807392, 2575039736256740314, -7803521813182963380, -7876943857235327196, 56269010885645520, 7417591635542850574, -6907680069794125375, 6531632281214254321, 7014445363384813781, -8681664890784800888, 7219964038695109401, 6756764466860574160, 1789954678045529674, -1335653650150848103, -928650376983969000, -4319562382198011677, 3627982001996933078, 6709446486973791041, 50365639515001438, -4723088506383709176, -989799393669168409, -6884393927805650255, 2137838288095421568, 2035601076520437763, 2854041178342208778, -4230353222642009796, 9161035368666734598, 5723595328722592552, -5240450739054301426, -3811011906531734905, -6652655064390295222, -1071492933004332904, -23090648579602723, -7852280585483772283, -3218302238885030434, -5558203672196491920, 5363625513107461792, 2857931834978150504, 5816817509490364905, 8705412457605912815, -9001237871363535131, 5372544962964494314, -3732504860581423471, -1689725134578914778, 252695792052499106, 383585012655668966, 4564452022988505575, -6187619051100758030, -4596776904188924370, 97223002029603781, 5724668741791588655, -6844936433089951339, 936695171381717631, -9015202489132629380, 2890170352300620348, 6185073674648946551, 77467571978625798, 6473387548436861078, 4870832448718573995, 3761903199032378549, -7144321973099061236, 8280025665725034732, 772638564347589154, -4372427892630359972, 5052074121036312165, 8797499046576556097, -3074773784104930630, 703862074880474660, -3604497667191381691, -4175924615349836461, -4675986421159984312, 3659305196643771673, 2507582415515462337, 5686556585242263320, -7525535642739226497, 1426877500142237997, 8260608696777969526, -3553756636007311684, 7986534795414733361, -743757303203258287, -9077028374338143018, -1006342375644801567, 6866260556387203777, -6751365809636160665, 1523413503891722687, -5297781222906273114, 858106848981761506, 5644228301183323679, -4542823553779553524, -4488719974975959451, -2811703002895913722, 3331721693627287123, 2704103012387122487, 5034226912642977658, -865565180332728508, -7949088462790623579, -4208460939445143339, 5475656673231497838, 1094762380324555299, 2833661454877006912, 5315999854388242392, 2172654140971713334, -635124055736189505, 5835272454069430665, -1125547323310034206, 4497840473825704342, 5372819866645000103, -6340846590744899253, 8308933521514706654, 2751666202465664570}
INFO  2023-07-23 08:17:45,020 [shard 0] system_distributed_keyspace - system_distributed keyspace is already present. Not creating
INFO  2023-07-23 08:17:45,020 [shard 0] system_distributed_keyspace - system_distributed_everywhere keyspace is already present. Not creating
INFO  2023-07-23 08:17:45,021 [shard 0] system_distributed_keyspace - All tables are present on start
INFO  2023-07-23 08:17:45,021 [shard 0] system_distributed_keyspace - All schemas are uptodate on start
INFO  2023-07-23 08:17:45,104 [shard 0] storage_service - enable_repair_based_node_ops=true, allowed_repair_based_node_ops={decommission ,bootstrap ,rebuild ,removenode ,replace}
INFO  2023-07-23 08:17:45,104 [shard 0] storage_service - entering BOOTSTRAP mode
INFO  2023-07-23 08:17:45,104 [shard 0] storage_service - Wait until local node knows tokens of peer nodes
INFO  2023-07-23 08:17:45,104 [shard 0] gossip - Waiting for pending range setup...
INFO  2023-07-23 08:17:45,104 [shard 0] compaction - [Compact system.local 66ddc3d0-2931-11ee-a120-2a2d7da362fd] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g7w_0n1h_34ycw2g7jx5i3pll31-big-Data.db:level=0:origin=compaction,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g7w_0n1l_0601s2g7jx5i3pll31-big-Data.db:level=0:origin=memtable]
INFO  2023-07-23 08:17:45,164 [shard 0] compaction - [Compact system.local 66ddc3d0-2931-11ee-a120-2a2d7da362fd] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g7w_0n1l_0nkrk2g7jx5i3pll31-big-Data.db:level=0]. 18kB to 12kB (~66% of original) in 43ms = 440kB/s. ~256 total partitions merged to 1.
INFO  2023-07-23 08:18:02,105 [shard 0] gossip - No gossip backlog; proceeding
INFO  2023-07-23 08:18:02,107 [shard 0] cdc - Inserting new generation data at UUID cfbbe7d7-7806-4e2f-b6fc-8755390d4673
INFO  2023-07-23 08:18:02,177 [shard 0] cdc - New CDC generation: (2023/07/23 08:18:27, cfbbe7d7-7806-4e2f-b6fc-8755390d4673)
INFO  2023-07-23 08:18:02,179 [shard 0] storage_service - Starting to bootstrap...
INFO  2023-07-23 08:18:02,179 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Started bootstrap operation: node=b0643dbe-37b6-4e64-86c4-bded508a5c69/127.0.10.2
INFO  2023-07-23 08:18:02,180 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: sync_nodes=bootstrap, ignore_nodes=b0643dbe-37b6-4e64-86c4-bded508a5c69
INFO  2023-07-23 08:18:02,193 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Started heartbeat_updater (interval=10s)
INFO  2023-07-23 08:18:02,194 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Started bootstrap_prepare[4b61683f-0431-4d19-b651-9f87bdc4e86e]: ignore_nodes={}, leaving_nodes={}, replace_nodes={}, bootstrap_nodes={{127.0.10.2 -> {-8395587159826219290, -5464666828924826936, -7222252513962729396, 135100649381944257, 6576852586953413498, 325464549287023086, 4904036563849780271, -6546981841352807480, 2865406470152825579, -394784590446060672, 8051423441471499559, 6844991086976765892, -5614594334687664535, 1086175589025211548, 2408945801985234454, 8122767265582843160, -2956082443385573348, -960961978041855340, -3643595389008655452, 6857527436554476253, 5707640721068606399, 8897833340820015336, 2703363383166069333, -6588602489652425955, 5440287864758530396, -3094282848608922741, -1833526704171643022, -5105986572299303535, 7343359728267118239, -1834910154028555994, 1359791063466712191, -6507608401952339437, 566681375887275487, -306335652492972568, -4759202512546421207, -3129379551594474653, -5245088751617943006, -1404467098100775533, 3621138246620388043, 1485318039863435327, -4634401387007861583, -3038957000594949223, -6067046826404507156, 259137294824191805, 1926200563648969364, -1129145445286767819, -4639143725143526709, -1270097303253184727, -4966536635347290250, -1836953348237170973, -5794517138929612802, -3357996511996118312, -3791611262883130211, -2712206311139191209, -6791804046333807375, -6857569926998170414, -1214712356026727843, 5648296861051792831, -3655125933126647862, -4437868813526158245, 5370436154027038373, 7792027546790392396, -4561037836572836580, 3279434197974400561, 15665818235702521, 1524968315694800053, -176568868099139895, -6701925414249034591, 5178105708499513041, 5804070608230706547, 5230568807411941866, 4853077499092200821, -7369163820979119575, -6952830276823514532, -103846967177016728, -992945990790605636, 8386798552877056049, 1133063589928776842, -5956267497982120165, -1738707564756902725, -2311239167820960648, -3617434983021206653, -8806310878743135339, 8682303417832312958, 6542373449263183026, 8453323091764003081, 6401264866713689372, -3510202054027109158, 717199908134802740, -909605829712997119, -5535683829237212567, 3940927772554800994, 7989711667997573788, 8150595733957275539, -2056771582667231152, -6866192218627519514, -7574389104777509529, 5949929264212497033, 6666375004548028554, -2043394039982026649, -6812742753429456488, -3465313958018044075, -7402991056989302396, -519923436200876788, -4296903967476984787, 3858349911860527952, 8003019013187970627, -3210942499279608571, 8134376040994854713, -5268721429312034183, -3255006945112380069, -1237903303615122379, 3508003286295929138, 5602756197637522861, -82889097934825496, 765186793610237170, -7218303157549388562, -8371313459689358554, -6077713351632690487, -1338841163598433145, -1461213882739586397, -8003371252654520955, -5723333243472021823, -3598399274862099320, -7975816588440379235, 3182154134710411372, 5982218720245695231, -8022333615083156526, -8013322529234294041, 6582953801098307917, -1746134476029010781, -7962438755842596771, 8440675877475389789, -5517615739684767939, -6728175856340492674, -7709261831346207023, -8218332298477205921, 5863034785872395197, 8549729515336153751, 8444318272321445644, 3406730916609539115, 5280359148735403361, 5662208716882257323, -5591151511066121729, -8883677973470414918, 3941620723706007740, -8502421991628807392, 2575039736256740314, -7803521813182963380, -7876943857235327196, 56269010885645520, 7417591635542850574, -6907680069794125375, 6531632281214254321, 7014445363384813781, -8681664890784800888, 7219964038695109401, 6756764466860574160, 1789954678045529674, -1335653650150848103, -928650376983969000, -4319562382198011677, 3627982001996933078, 6709446486973791041, 50365639515001438, -4723088506383709176, -989799393669168409, -6884393927805650255, 2137838288095421568, 2035601076520437763, 2854041178342208778, -4230353222642009796, 9161035368666734598, 5723595328722592552, -5240450739054301426, -3811011906531734905, -6652655064390295222, -1071492933004332904, -23090648579602723, -7852280585483772283, -3218302238885030434, -5558203672196491920, 5363625513107461792, 2857931834978150504, 5816817509490364905, 8705412457605912815, -9001237871363535131, 5372544962964494314, -3732504860581423471, -1689725134578914778, 252695792052499106, 383585012655668966, 4564452022988505575, -6187619051100758030, -4596776904188924370, 97223002029603781, 5724668741791588655, -6844936433089951339, 936695171381717631, -9015202489132629380, 2890170352300620348, 6185073674648946551, 77467571978625798, 6473387548436861078, 4870832448718573995, 3761903199032378549, -7144321973099061236, 8280025665725034732, 772638564347589154, -4372427892630359972, 5052074121036312165, 8797499046576556097, -3074773784104930630, 703862074880474660, -3604497667191381691, -4175924615349836461, -4675986421159984312, 3659305196643771673, 2507582415515462337, 5686556585242263320, -7525535642739226497, 1426877500142237997, 8260608696777969526, -3553756636007311684, 7986534795414733361, -743757303203258287, -9077028374338143018, -1006342375644801567, 6866260556387203777, -6751365809636160665, 1523413503891722687, -5297781222906273114, 858106848981761506, 5644228301183323679, -4542823553779553524, -4488719974975959451, -2811703002895913722, 3331721693627287123, 2704103012387122487, 5034226912642977658, -865565180332728508, -7949088462790623579, -4208460939445143339, 5475656673231497838, 1094762380324555299, 2833661454877006912, 5315999854388242392, 2172654140971713334, -635124055736189505, 5835272454069430665, -1125547323310034206, 4497840473825704342, 5372819866645000103, -6340846590744899253, 8308933521514706654, 2751666202465664570}}}, repair_tables={}
INFO  2023-07-23 08:18:02,204 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Added node=127.0.10.2 as bootstrap, coordinator=127.0.10.2
INFO  2023-07-23 08:18:02,740 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Finished bootstrap_prepare[4b61683f-0431-4d19-b651-9f87bdc4e86e]: ignore_nodes={}, leaving_nodes={}, replace_nodes={}, bootstrap_nodes={{127.0.10.2 -> {-8395587159826219290, -5464666828924826936, -7222252513962729396, 135100649381944257, 6576852586953413498, 325464549287023086, 4904036563849780271, -6546981841352807480, 2865406470152825579, -394784590446060672, 8051423441471499559, 6844991086976765892, -5614594334687664535, 1086175589025211548, 2408945801985234454, 8122767265582843160, -2956082443385573348, -960961978041855340, -3643595389008655452, 6857527436554476253, 5707640721068606399, 8897833340820015336, 2703363383166069333, -6588602489652425955, 5440287864758530396, -3094282848608922741, -1833526704171643022, -5105986572299303535, 7343359728267118239, -1834910154028555994, 1359791063466712191, -6507608401952339437, 566681375887275487, -306335652492972568, -4759202512546421207, -3129379551594474653, -5245088751617943006, -1404467098100775533, 3621138246620388043, 1485318039863435327, -4634401387007861583, -3038957000594949223, -6067046826404507156, 259137294824191805, 1926200563648969364, -1129145445286767819, -4639143725143526709, -1270097303253184727, -4966536635347290250, -1836953348237170973, -5794517138929612802, -3357996511996118312, -3791611262883130211, -2712206311139191209, -6791804046333807375, -6857569926998170414, -1214712356026727843, 5648296861051792831, -3655125933126647862, -4437868813526158245, 5370436154027038373, 7792027546790392396, -4561037836572836580, 3279434197974400561, 15665818235702521, 1524968315694800053, -176568868099139895, -6701925414249034591, 5178105708499513041, 5804070608230706547, 5230568807411941866, 4853077499092200821, -7369163820979119575, -6952830276823514532, -103846967177016728, -992945990790605636, 8386798552877056049, 1133063589928776842, -5956267497982120165, -1738707564756902725, -2311239167820960648, -3617434983021206653, -8806310878743135339, 8682303417832312958, 6542373449263183026, 8453323091764003081, 6401264866713689372, -3510202054027109158, 717199908134802740, -909605829712997119, -5535683829237212567, 3940927772554800994, 7989711667997573788, 8150595733957275539, -2056771582667231152, -6866192218627519514, -7574389104777509529, 5949929264212497033, 6666375004548028554, -2043394039982026649, -6812742753429456488, -3465313958018044075, -7402991056989302396, -519923436200876788, -4296903967476984787, 3858349911860527952, 8003019013187970627, -3210942499279608571, 8134376040994854713, -5268721429312034183, -3255006945112380069, -1237903303615122379, 3508003286295929138, 5602756197637522861, -82889097934825496, 765186793610237170, -7218303157549388562, -8371313459689358554, -6077713351632690487, -1338841163598433145, -1461213882739586397, -8003371252654520955, -5723333243472021823, -3598399274862099320, -7975816588440379235, 3182154134710411372, 5982218720245695231, -8022333615083156526, -8013322529234294041, 6582953801098307917, -1746134476029010781, -7962438755842596771, 8440675877475389789, -5517615739684767939, -6728175856340492674, -7709261831346207023, -8218332298477205921, 5863034785872395197, 8549729515336153751, 8444318272321445644, 3406730916609539115, 5280359148735403361, 5662208716882257323, -5591151511066121729, -8883677973470414918, 3941620723706007740, -8502421991628807392, 2575039736256740314, -7803521813182963380, -7876943857235327196, 56269010885645520, 7417591635542850574, -6907680069794125375, 6531632281214254321, 7014445363384813781, -8681664890784800888, 7219964038695109401, 6756764466860574160, 1789954678045529674, -1335653650150848103, -928650376983969000, -4319562382198011677, 3627982001996933078, 6709446486973791041, 50365639515001438, -4723088506383709176, -989799393669168409, -6884393927805650255, 2137838288095421568, 2035601076520437763, 2854041178342208778, -4230353222642009796, 9161035368666734598, 5723595328722592552, -5240450739054301426, -3811011906531734905, -6652655064390295222, -1071492933004332904, -23090648579602723, -7852280585483772283, -3218302238885030434, -5558203672196491920, 5363625513107461792, 2857931834978150504, 5816817509490364905, 8705412457605912815, -9001237871363535131, 5372544962964494314, -3732504860581423471, -1689725134578914778, 252695792052499106, 383585012655668966, 4564452022988505575, -6187619051100758030, -4596776904188924370, 97223002029603781, 5724668741791588655, -6844936433089951339, 936695171381717631, -9015202489132629380, 2890170352300620348, 6185073674648946551, 77467571978625798, 6473387548436861078, 4870832448718573995, 3761903199032378549, -7144321973099061236, 8280025665725034732, 772638564347589154, -4372427892630359972, 5052074121036312165, 8797499046576556097, -3074773784104930630, 703862074880474660, -3604497667191381691, -4175924615349836461, -4675986421159984312, 3659305196643771673, 2507582415515462337, 5686556585242263320, -7525535642739226497, 1426877500142237997, 8260608696777969526, -3553756636007311684, 7986534795414733361, -743757303203258287, -9077028374338143018, -1006342375644801567, 6866260556387203777, -6751365809636160665, 1523413503891722687, -5297781222906273114, 858106848981761506, 5644228301183323679, -4542823553779553524, -4488719974975959451, -2811703002895913722, 3331721693627287123, 2704103012387122487, 5034226912642977658, -865565180332728508, -7949088462790623579, -4208460939445143339, 5475656673231497838, 1094762380324555299, 2833661454877006912, 5315999854388242392, 2172654140971713334, -635124055736189505, 5835272454069430665, -1125547323310034206, 4497840473825704342, 5372819866645000103, -6340846590744899253, 8308933521514706654, 2751666202465664570}}}, repair_tables={}
INFO  2023-07-23 08:18:06,006 [shard 0] repair - bootstrap_with_repair: started with keyspaces={ks_0040, ks_0039, ks_0044, ks_0006, ks_0036, ks_0021, ks_0031, ks_0011, ks_0038, ks_0047, ks_0025, ks_0014, ks_0048, ks_0045, ks_0016, ks_0015, ks_0000, ks_0046, ks_0026, ks_0024, ks_0037, system_distributed_everywhere, ks_0043, ks_0035, ks_0019, ks_0042, ks_0023, system_distributed, ks_0017, ks_0032, ks_0034, ks_0003, ks_0004, ks_0041, ks_0030, system_auth, ks_0029, ks_0033, ks_0022, ks_0028, ks_0007, ks_0013, ks_0002, ks_0027, ks_0010, ks_0049, ks_0001, system_traces, ks_0005, ks_0020, ks_0012, ks_0018, ks_0009, ks_0008}, nr_ranges_total=31498
INFO  2023-07-23 08:18:06,143 [shard 0] repair - bootstrap_with_repair: started with keyspace=ks_0040, nr_ranges=512
...
INFO  2023-07-23 08:23:12,310 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: sync data for keyspace=system_distributed, status=started
INFO  2023-07-23 08:23:12,313 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Started to repair 1 out of 4 tables in keyspace=system_distributed, table=cdc_generation_timestamps, table_id=fdf455c4-cfec-3e00-9719-d7a45436c89d, repair_reason=bootstrap
INFO  2023-07-23 08:23:12,316 [shard 1] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Started to repair 1 out of 4 tables in keyspace=system_distributed, table=cdc_generation_timestamps, table_id=fdf455c4-cfec-3e00-9719-d7a45436c89d, repair_reason=bootstrap
INFO  2023-07-23 08:23:19,099 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Started to repair 2 out of 4 tables in keyspace=system_distributed, table=service_levels, table_id=b8c556bd-212d-37ad-9484-690c73a5994b, repair_reason=bootstrap
INFO  2023-07-23 08:23:19,707 [shard 1] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Started to repair 2 out of 4 tables in keyspace=system_distributed, table=service_levels, table_id=b8c556bd-212d-37ad-9484-690c73a5994b, repair_reason=bootstrap
INFO  2023-07-23 08:23:19,708 [shard 1] compaction_manager - Starting off-strategy compaction for system_distributed.cdc_generation_timestamps compaction_group=0/1, 1 candidates were found
INFO  2023-07-23 08:23:19,711 [shard 1] compaction_manager - Done with off-strategy compaction for system_distributed.cdc_generation_timestamps compaction_group=0/1
INFO  2023-07-23 08:23:22,888 [shard 0] compaction_manager - Asked to stop
WARN  2023-07-23 08:23:22,889 [shard 0] gossip - failure_detector_loop: Got error in the loop, live_nodes={127.0.10.1}: seastar::sleep_aborted (Sleep is aborted)
INFO  2023-07-23 08:23:22,890 [shard 0] gossip - failure_detector_loop: Finished main loop
INFO  2023-07-23 08:23:22,890 [shard 1] compaction_manager - Asked to stop
INFO  2023-07-23 08:23:22,890 [shard 0] compaction_manager - Stopped
INFO  2023-07-23 08:23:22,892 [shard 1] compaction_manager - Stopped
WARN  2023-07-23 08:23:22,892 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Repair job aborted by user, job=4e640e08-0a35-4b0e-b128-fef3a47563ac, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}
WARN  2023-07-23 08:23:22,895 [shard 1] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Repair job aborted by user, job=4e640e08-0a35-4b0e-b128-fef3a47563ac, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}
INFO  2023-07-23 08:23:22,921 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: stats: repair_reason=bootstrap, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, ranges_nr=513, round_nr=802, round_nr_fast_path_already_synced=802, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=4010, tx_hashes_nr=0, rx_hashes_nr=0, duration=10.609721 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.10.1, 0}, {127.0.10.2, 0}}, row_from_disk_nr={{127.0.10.1, 0}, {127.0.10.2, 0}}, row_from_disk_bytes_per_sec={{127.0.10.1, 0}, {127.0.10.2, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.10.1, 0}, {127.0.10.2, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
WARN  2023-07-23 08:23:22,921 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true
INFO  2023-07-23 08:23:22,937 [shard 1] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: stats: repair_reason=bootstrap, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, ranges_nr=513, round_nr=766, round_nr_fast_path_already_synced=765, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=3828, tx_hashes_nr=0, rx_hashes_nr=1, duration=10.624115 seconds, tx_row_nr=0, rx_row_nr=2, tx_row_bytes=0, rx_row_bytes=216, row_from_disk_bytes={{127.0.10.1, 216}, {127.0.10.2, 0}}, row_from_disk_nr={{127.0.10.1, 2}, {127.0.10.2, 0}}, row_from_disk_bytes_per_sec={{127.0.10.1, 1.93893e-05}, {127.0.10.2, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.10.1, 0.188251}, {127.0.10.2, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={{127.0.10.1, 2}}
WARN  2023-07-23 08:23:22,937 [shard 1] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true
WARN  2023-07-23 08:23:22,939 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: sync data for keyspace=system_distributed, status=failed: std::runtime_error ({shard 0: std::runtime_error (repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true), shard 1: std::runtime_error (repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true)})
ERROR 2023-07-23 08:23:22,940 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Operation failed, sync_nodes={127.0.10.1, 127.0.10.2}: std::runtime_error ({shard 0: std::runtime_error (repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true), shard 1: std::runtime_error (repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true)})
INFO  2023-07-23 08:23:22,941 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Stopped heartbeat_updater
INFO  2023-07-23 08:23:22,941 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Started bootstrap_abort[4b61683f-0431-4d19-b651-9f87bdc4e86e]: ignore_nodes={}, leaving_nodes={}, replace_nodes={}, bootstrap_nodes={{127.0.10.2 -> {-8395587159826219290, -5464666828924826936, -7222252513962729396, 135100649381944257, 6576852586953413498, 325464549287023086, 4904036563849780271, -6546981841352807480, 2865406470152825579, -394784590446060672, 8051423441471499559, 6844991086976765892, -5614594334687664535, 1086175589025211548, 2408945801985234454, 8122767265582843160, -2956082443385573348, -960961978041855340, -3643595389008655452, 6857527436554476253, 5707640721068606399, 8897833340820015336, 2703363383166069333, -6588602489652425955, 5440287864758530396, -3094282848608922741, -1833526704171643022, -5105986572299303535, 7343359728267118239, -1834910154028555994, 1359791063466712191, -6507608401952339437, 566681375887275487, -306335652492972568, -4759202512546421207, -3129379551594474653, -5245088751617943006, -1404467098100775533, 3621138246620388043, 1485318039863435327, -4634401387007861583, -3038957000594949223, -6067046826404507156, 259137294824191805, 1926200563648969364, -1129145445286767819, -4639143725143526709, -1270097303253184727, -4966536635347290250, -1836953348237170973, -5794517138929612802, -3357996511996118312, -3791611262883130211, -2712206311139191209, -6791804046333807375, -6857569926998170414, -1214712356026727843, 5648296861051792831, -3655125933126647862, -4437868813526158245, 5370436154027038373, 7792027546790392396, -4561037836572836580, 3279434197974400561, 15665818235702521, 1524968315694800053, -176568868099139895, -6701925414249034591, 5178105708499513041, 5804070608230706547, 5230568807411941866, 4853077499092200821, -7369163820979119575, -6952830276823514532, -103846967177016728, -992945990790605636, 8386798552877056049, 1133063589928776842, -5956267497982120165, -1738707564756902725, -2311239167820960648, -3617434983021206653, -8806310878743135339, 8682303417832312958, 6542373449263183026, 8453323091764003081, 6401264866713689372, -3510202054027109158, 717199908134802740, -909605829712997119, -5535683829237212567, 3940927772554800994, 7989711667997573788, 8150595733957275539, -2056771582667231152, -6866192218627519514, -7574389104777509529, 5949929264212497033, 6666375004548028554, -2043394039982026649, -6812742753429456488, -3465313958018044075, -7402991056989302396, -519923436200876788, -4296903967476984787, 3858349911860527952, 8003019013187970627, -3210942499279608571, 8134376040994854713, -5268721429312034183, -3255006945112380069, -1237903303615122379, 3508003286295929138, 5602756197637522861, -82889097934825496, 765186793610237170, -7218303157549388562, -8371313459689358554, -6077713351632690487, -1338841163598433145, -1461213882739586397, -8003371252654520955, -5723333243472021823, -3598399274862099320, -7975816588440379235, 3182154134710411372, 5982218720245695231, -8022333615083156526, -8013322529234294041, 6582953801098307917, -1746134476029010781, -7962438755842596771, 8440675877475389789, -5517615739684767939, -6728175856340492674, -7709261831346207023, -8218332298477205921, 5863034785872395197, 8549729515336153751, 8444318272321445644, 3406730916609539115, 5280359148735403361, 5662208716882257323, -5591151511066121729, -8883677973470414918, 3941620723706007740, -8502421991628807392, 2575039736256740314, -7803521813182963380, -7876943857235327196, 56269010885645520, 7417591635542850574, -6907680069794125375, 6531632281214254321, 7014445363384813781, -8681664890784800888, 7219964038695109401, 6756764466860574160, 1789954678045529674, -1335653650150848103, -928650376983969000, -4319562382198011677, 3627982001996933078, 6709446486973791041, 50365639515001438, -4723088506383709176, -989799393669168409, -6884393927805650255, 2137838288095421568, 2035601076520437763, 2854041178342208778, -4230353222642009796, 9161035368666734598, 5723595328722592552, -5240450739054301426, -3811011906531734905, -6652655064390295222, -1071492933004332904, -23090648579602723, -7852280585483772283, -3218302238885030434, -5558203672196491920, 5363625513107461792, 2857931834978150504, 5816817509490364905, 8705412457605912815, -9001237871363535131, 5372544962964494314, -3732504860581423471, -1689725134578914778, 252695792052499106, 383585012655668966, 4564452022988505575, -6187619051100758030, -4596776904188924370, 97223002029603781, 5724668741791588655, -6844936433089951339, 936695171381717631, -9015202489132629380, 2890170352300620348, 6185073674648946551, 77467571978625798, 6473387548436861078, 4870832448718573995, 3761903199032378549, -7144321973099061236, 8280025665725034732, 772638564347589154, -4372427892630359972, 5052074121036312165, 8797499046576556097, -3074773784104930630, 703862074880474660, -3604497667191381691, -4175924615349836461, -4675986421159984312, 3659305196643771673, 2507582415515462337, 5686556585242263320, -7525535642739226497, 1426877500142237997, 8260608696777969526, -3553756636007311684, 7986534795414733361, -743757303203258287, -9077028374338143018, -1006342375644801567, 6866260556387203777, -6751365809636160665, 1523413503891722687, -5297781222906273114, 858106848981761506, 5644228301183323679, -4542823553779553524, -4488719974975959451, -2811703002895913722, 3331721693627287123, 2704103012387122487, 5034226912642977658, -865565180332728508, -7949088462790623579, -4208460939445143339, 5475656673231497838, 1094762380324555299, 2833661454877006912, 5315999854388242392, 2172654140971713334, -635124055736189505, 5835272454069430665, -1125547323310034206, 4497840473825704342, 5372819866645000103, -6340846590744899253, 8308933521514706654, 2751666202465664570}}}, repair_tables={}
INFO  2023-07-23 08:23:22,945 [shard 0] storage_service - aborting node operation ops_uuid=4b61683f-0431-4d19-b651-9f87bdc4e86e
INFO  2023-07-23 08:23:22,953 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Removed node=127.0.10.2 as bootstrap, coordinator=127.0.10.2
INFO  2023-07-23 08:23:23,380 [shard 0] gossip - Got shutdown message from 127.0.10.1, received_generation=1690100109, local_generation=1690100109
INFO  2023-07-23 08:23:23,382 [shard 0] gossip - InetAddress 127.0.10.1 is now DOWN, status = shutdown
bhalevy commented 1 year ago

Also, the 0.1 seconds sleep in https://github.com/scylladb/scylla-ccm/blob/65872f6f37c2345d87915777771fca0591ec4101/ccmlib/scylla_node.py#L1378 floods the log with messages. See https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/245/artifact/logs-full.debug.094/dtest-gw0.log The following messages are repeated every 0.1 seconds for 360 seconds:

08:17:22,692 788     urllib3.connectionpool         DEBUG    connectionpool.py   :228  | test_rebuild_many_keyspaces: Starting new HTTP connection (1): 127.0.10.1:10000
08:17:22,694 788     urllib3.connectionpool         DEBUG    connectionpool.py   :456  | test_rebuild_many_keyspaces: http://127.0.10.1:10000 "GET /gossiper/endpoint/live HTTP/1.1" 200 14
08:17:22,696 788     urllib3.connectionpool         DEBUG    connectionpool.py   :228  | test_rebuild_many_keyspaces: Starting new HTTP connection (1): 127.0.10.1:10000
08:17:22,698 788     urllib3.connectionpool         DEBUG    connectionpool.py   :456  | test_rebuild_many_keyspaces: http://127.0.10.1:10000 "GET /storage_service/nodes/joining HTTP/1.1" 200 2
08:17:22,799 788     urllib3.connectionpool         DEBUG    connectionpool.py   :228  | test_rebuild_many_keyspaces: Starting new HTTP connection (1): 127.0.10.1:10000
$ wc dtest-gw0.log
  13920  193064 2487237 dtest-gw0.log
bhalevy commented 1 year ago

All that, while on https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/245/artifact/logs-full.debug.094/1690100633894_rebuild_test.py%3A%3ATestRebuild%3A%3Atest_rebuild_many_keyspaces/node2.log:

INFO  2023-07-23 08:17:28,186 [shard 0] storage_service - Set host_id=ddb99668-493f-4171-ab7f-a1208ba4186f to be owned by node=127.0.10.1
INFO  2023-07-23 08:17:28,187 [shard 0] gossip - InetAddress 127.0.10.1 is now UP, status = NORMAL
INFO  2023-07-23 08:17:28,196 [shard 0] gossip - Live nodes seen in gossip: {127.0.10.1, 127.0.10.2}
INFO  2023-07-23 08:17:28,196 [shard 0] storage_service - Started waiting for normal state handlers to finish
INFO  2023-07-23 08:17:28,196 [shard 0] storage_service - Normal state handlers not yet finished for nodes (127.0.10.1, status=NORMAL)
INFO  2023-07-23 08:17:28,228 [shard 0] migration_manager - Requesting schema pull from 127.0.10.1:0
INFO  2023-07-23 08:17:28,228 [shard 0] migration_manager - Pulling schema from 127.0.10.1:0
INFO  2023-07-23 08:17:28,283 [shard 0] migration_manager - Requesting schema pull from 127.0.10.1:0
INFO  2023-07-23 08:17:28,296 [shard 0] storage_service - Finished waiting for normal state handlers; endpoints observed in gossip: (127.0.10.1, status=NORMAL), (127.0.10.2, status=UNKNOWN)
INFO  2023-07-23 08:17:28,296 [shard 0] storage_service - Waiting for nodes {127.0.10.2, 127.0.10.1} to be alive
INFO  2023-07-23 08:17:28,297 [shard 0] storage_service - Nodes {127.0.10.2, 127.0.10.1} are alive
fruch commented 1 year ago

Logging I'll take care of, I'll disable this for those calls

The default timeout should be bigger for debug mode ? 10min as it used to be ?

bhalevy commented 1 year ago

If @nyh thinks we still need it than 10 minutes should be enough, at least for small cluster. But we need to see how long it takes on large clusters where bootstrap repair needs to communucate with more nodes.

nyh commented 1 year ago

If @nyh thinks we still need it than 10 minutes should be enough, at least for small cluster. But we need to see how long it takes on large clusters where bootstrap repair needs to communucate with more nodes.

Maybe there shouldn't be any timeout at all, whatsoever? If the waiting code is not buggy, it will eventually stop waiting when the node becomes available. If the waiting code is buggy, the caller (dtest, Jenkins, etc.) will eventually stop on an overall timeout, so we're still safe.

Alternatively, maybe the timeout should be made configurable. It can default to something (e.g., 10 minutes), but if the user really wants to wait 60 minutes for a node on a huge cluster to come up, they can configure the timeout to 60 minutes.

bhalevy commented 1 year ago

I think we can also make the timeout be a linear function of the number of nodes in the cluster (at least for the first node in the loop, the rest can follow shortly after). But I'm not sure it's worth it

bhalevy commented 1 year ago

Sigh, apparently 120 seconds in non-debug modes isn't enough and there's flakiness, see https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/314/testReport/update_cluster_layout_tests/TestLargeScaleCluster/Run_Dtest_Parallel_Cloud_Machines___LongDtest___long_split000___test_add_many_nodes_under_load/

ccmlib.node.TimeoutError: watch_rest_for_alive() timeout after 120 seconds
bhalevy commented 12 months ago

not only in the many nodes test.

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/316/testReport/update_cluster_layout_tests/TestUpdateClusterLayout/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split013___test_remove_garbage_members_from_group0_after_abort_decommission_Announcing_that_I_have_left_the_ring__/

ccmlib.node.TimeoutError: watch_rest_for_alive() timeout after 120 seconds
bhalevy commented 12 months ago

And it turns out that 600 seconds isn't emough either in debug mode if there are enough keyspaces and/or tables. See https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/246/testReport/rebuild_test/TestRebuild/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split090___test_rebuild_many_keyspaces/ and https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/246/testReport/rebuild_test/TestRebuild/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split086___test_rebuild_many_tables/

We can extend the timeout indefinitely while there is bootstrap progress like we do for the cql listen message, maybe this is the way to go for test robustness

bhalevy commented 12 months ago

Also https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/246/testReport/lwt_random_test/TestRandomPaxos/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split093___test_topology_grow/

bhalevy commented 12 months ago

and https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/246/testReport/cdc_test/TestCdc/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split084___test_cluster_expansion_with_cdc_Single_cluster_/ and similar variants

bhalevy commented 12 months ago

and https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/246/testReport/secondary_indexes_test/TestSecondaryIndexes/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split075___test_add_node_during_index_build_2/

bhalevy commented 12 months ago

and https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/246/testReport/alternator_ttl_tests/TestAlternatorTTL/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split022___test_ttl_with_load_and_decommission/

fruch commented 12 months ago

@bhalevy I'm not sure what the suggestion is here exactly, this function is waiting for bootstrap to end already...

bhalevy commented 12 months ago

Yes, it's polling the rest api and the event it is waiting for doesn't happen until bootstrap is over. I suggested to watch the log for progress as we do in wait_for_starting.

But coming to think about it, I think we could just change the order of operation as follows to run wait_for_starting before self.watch_rest_for_alive(others):

        wait_timeout = timeout * 4 if timeout is not None else 420 if self.cluster.scylla_mode != 'debug' else 900

        if wait_other_notice:
            for node, _ in marks:
                node.watch_rest_for_alive(self, timeout=wait_timeout)

        if wait_for_binary_proto:
            from_mark = self.mark
            try:
                self.wait_for_binary_interface(from_mark=from_mark, process=self._process_scylla, timeout=wait_timeout)
            except TimeoutError as e:
                self.wait_for_starting(from_mark=self.mark, timeout=wait_timeout)
                self.wait_for_binary_interface(from_mark=from_mark, process=self._process_scylla, timeout=0)
        elif wait_other_notice:
            self.wait_for_starting(from_mark=self.mark, timeout=wait_timeout)

        if wait_other_notice:
            for node, _ in marks:
                self.watch_rest_for_alive(node, timeout=wait_timeout)
fruch commented 12 months ago

Why not todo all the wait_other_notice logic after wait_for_binary_proto ? as it in scylla_cluster.py ?

why split it into two parts like that ?

bhalevy commented 12 months ago

it's possible, although the other node notice this node as up before it starts listening for cql, but that shouldn't matter.

However, if only wait_other_notics is set, we'd still need wait_for_starting to prevent flakiness.

fruch commented 12 months ago

it's possible, although the other node notice this node as up before it starts listening for cql, but that shouldn't matter.

However, if only wait_other_notics is set, we'd still need wait_for_starting to prevent flakiness.

Even though I don't know of any test that would do wait_other_notice without wait_for_binary_protocol ...

bhalevy commented 12 months ago

It should be ok to imply wait_for_binary_protocol when wait_other_notice is set.