scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
52 stars 87 forks source link

all rolling upgrade are failing cause it's using CDC, and it doesn't work with tablets #7602

Open fruch opened 1 month ago

fruch commented 1 month ago

Packages

Scylla version: 6.0.0-20240606.a77615adf324 with build-id 6c03ff0a571ac30073b739e9ba917545fe23e5c5 Kernel Version: 5.15.0-1060-gcp

Issue description

2024-06-08 03:22:24.659: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=979898c6-b9b5-4c90-9151-f7bde11b560a, source=UpgradeTest.test_rolling_upgrade (upgrade_test.UpgradeTest)() message=Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/upgrade_test.py", line 617, in test_rolling_upgrade
self.prepare_keyspaces_and_tables()
File "/home/ubuntu/scylla-cluster-tests/sdcm/fill_db_data.py", line 3357, in prepare_keyspaces_and_tables
self.cql_create_tables(session)
File "/home/ubuntu/scylla-cluster-tests/sdcm/fill_db_data.py", line 3127, in cql_create_tables
session.execute(create_table)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 1836, in execute_verbose
return execute_orig(*args, **kwargs)
File "cassandra/cluster.py", line 2729, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 5120, in cassandra.cluster.ResponseFuture.result
cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot create CDC log for a table keyspace_fill_db_data.order_by_with_in_test, because keyspace uses tablets. See issue #16317."

Impact

User can't upgrade to master version, if he uses CDC

How frequently does it reproduce?

happened on all weekly upgrades tests

Installation details

Cluster size: 4 nodes (n2-highmem-64)

Scylla Nodes used in this run:

OS / Image: https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/family/ubuntu-2004-lts (gce: undefined_region)

Test: rolling-upgrade-ubuntu20.04-test Test id: 050050a0-8ff6-4411-86e6-09f4900333bc Test name: scylla-master/rolling-upgrade/rolling-upgrade-ubuntu20.04-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor 050050a0-8ff6-4411-86e6-09f4900333bc` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=050050a0-8ff6-4411-86e6-09f4900333bc) - Show all stored logs command: `$ hydra investigate show-logs 050050a0-8ff6-4411-86e6-09f4900333bc` ## Logs: - **db-cluster-050050a0.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/050050a0-8ff6-4411-86e6-09f4900333bc/20240608_032314/db-cluster-050050a0.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/050050a0-8ff6-4411-86e6-09f4900333bc/20240608_032314/db-cluster-050050a0.tar.gz) - **sct-runner-events-050050a0.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/050050a0-8ff6-4411-86e6-09f4900333bc/20240608_032314/sct-runner-events-050050a0.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/050050a0-8ff6-4411-86e6-09f4900333bc/20240608_032314/sct-runner-events-050050a0.tar.gz) - **sct-050050a0.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/050050a0-8ff6-4411-86e6-09f4900333bc/20240608_032314/sct-050050a0.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/050050a0-8ff6-4411-86e6-09f4900333bc/20240608_032314/sct-050050a0.log.tar.gz) - **loader-set-050050a0.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/050050a0-8ff6-4411-86e6-09f4900333bc/20240608_032314/loader-set-050050a0.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/050050a0-8ff6-4411-86e6-09f4900333bc/20240608_032314/loader-set-050050a0.tar.gz) - **monitor-set-050050a0.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/050050a0-8ff6-4411-86e6-09f4900333bc/20240608_032314/monitor-set-050050a0.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/050050a0-8ff6-4411-86e6-09f4900333bc/20240608_032314/monitor-set-050050a0.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-master/job/rolling-upgrade/job/rolling-upgrade-ubuntu20.04-test/195/) [Argus](https://argus.scylladb.com/test/97a03f7d-fd85-468c-8c23-a465f7693243/runs?additionalRuns[]=050050a0-8ff6-4411-86e6-09f4900333bc)
fruch commented 1 month ago

Keep in mind this is upgrade from 6.0 -> master

so tablets is enabled to begin with, and all of the feature which are not supported would need to be disabled, otherwise this test can't run.

fruch commented 1 month ago

@bhalevy maybe it's not 6.0 issue, but 6.0.1, but please assign it.

bhalevy commented 1 month ago

@ShlomiBalalis please look into this issue

fruch commented 1 month ago

@bhalevy @ShlomiBalalis

this is the 2nd week, that all of the rolling upgrades tests are failing on this one

yarongilor commented 1 month ago

@fruch , can a solution be something like: if self.version_cdc_support(): -> if self.version_cdc_support() and not tablets_enabled:

fruch commented 1 month ago

@fruch , can a solution be something like: if self.version_cdc_support(): -> if self.version_cdc_support() and not tablets_enabled:

maybe, other option put this logic inside version_cdc_support()

either review it ontop of a PR, when seeing tests working with it, would be a bit easier

fruch commented 1 month ago

@yarongilor

seems like some part of it might be in: https://github.com/scylladb/scylla-cluster-tests/pull/7525

please sync with @aleksbykov

bhalevy commented 4 days ago

@fruch / @aleksbykov is this issue still relevant?