scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
56 stars 94 forks source link

rolling upgrade failed on fill_db_data with: "Invalid null value for partition key part k" #3141

Closed yarongilor closed 1 year ago

yarongilor commented 3 years ago

Test details Test: upgrade_test.UpgradeTest.test_rolling_upgrade Build number: 135 Backend: gce: us-east1 Test-id: 615bb62f-621f-426b-bc5f-b373f2ea49c4 Start time: 2021-01-15 04:43:09 End time: 2021-01-15 05:07:00 Started by user: timer Test result FAILED System under test ScyllaDB version: 4.2.3-0.20210104.24346215c2 with build-id 0c8faf8bb8a3a0eda9337aad98ed3a6d814a4fa9 (-) Target ScyllaDB repo: https://s3.amazonaws.com/downloads.scylladb.com/rpm/unstable/centos/master/latest/scylla.repo Instance type: i3.2xlarge Number of ScyllaDB nodes: 4 Last events by severity CRITICAL - [0] No events with this severity ERROR - [1] 2021-01-15 04:57:10.104: (TestFrameworkEvent Severity.ERROR), source=UpgradeTest.test_rolling_upgrade (upgrade_test.UpgradeTest)() message=Traceback (most recent call last): File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/upgrade_test.py", line 448, in test_rolling_upgrade self.fill_and_verify_db_data('BEFORE UPGRADE', pre_fill=True) File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/upgrade_test.py", line 413, in fill_and_verify_db_data self.verify_db_data() File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/sdcm/fill_db_data.py", line 3105, in verify_db_data self.run_db_queries(session, session.default_fetch_size) File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/sdcm/fill_db_data.py", line 3049, in run_db_queries raise ex File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/sdcm/fill_db_data.py", line 3045, in run_db_queries res = session.execute(item['queries'][i]) File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/sdcm/utils/common.py", line 1175, in execute_verbose return execute_orig(*args, **kwargs) File "cassandra/cluster.py", line 2611, in cassandra.cluster.Session.execute File "cassandra/cluster.py", line 4829, in cassandra.cluster.ResponseFuture.result cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid null value for partition key part k" WARNING - [1] 21-01-15 04:43:11.000: (DatabaseLogEvent Severity.WARNING): type=SUPPRESSED_MESSAGES regex=journal: Suppressed line_number=8372 node=Node rolling-upgrade-master-centos-db-node-615bb62f-0-1 [34.73.108.207 | 10.142.0.104] (seed: True) 021-01-15T04:43:11+00:00 rolling-upgrade-master-centos-db-node-615bb62f-0-1 !INFO | journal: Suppressed 3632 messages from /scylla.slice/scylla-server.slice Running instances No instances Hydra commands: Restore Monitor Stack command: $ hydra investigate show-monitor 615bb62f-621f-426b-bc5f-b373f2ea49c4 Show all stored logs command: $ hydra investigate show-logs 615bb62f-621f-426b-bc5f-b373f2ea49c4 Logs: prometheus - https://cloudius-jenkins-test.s3.amazonaws.com/615bb62f-621f-426b-bc5f-b373f2ea49c4/prometheus_snapshot_20210115_050649.tar.gz grafana - https://cloudius-jenkins-test.s3.amazonaws.com/615bb62f-621f-426b-bc5f-b373f2ea49c4/20210115_045710/grafana-screenshot-overview-20210115_045710-rolling-upgrade-master-centos-monitor-node-615bb62f-0-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/615bb62f-621f-426b-bc5f-b373f2ea49c4/20210115_045710/grafana-screenshot-rolling-upgrade-centos7-test-scylla-per-server-metrics-nemesis-20210115_050114-rolling-upgrade-master-centos-monitor-node-615bb62f-0-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/615bb62f-621f-426b-bc5f-b373f2ea49c4/20210115_050700/grafana-screenshot-overview-20210115_050700-rolling-upgrade-master-centos-monitor-node-615bb62f-0-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/615bb62f-621f-426b-bc5f-b373f2ea49c4/20210115_050700/grafana-screenshot-rolling-upgrade-centos7-test-scylla-per-server-metrics-nemesis-20210115_051017-rolling-upgrade-master-centos-monitor-node-615bb62f-0-1.png db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/615bb62f-621f-426b-bc5f-b373f2ea49c4/20210115_051626/db-cluster-615bb62f.zip loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/615bb62f-621f-426b-bc5f-b373f2ea49c4/20210115_051626/loader-set-615bb62f.zip monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/615bb62f-621f-426b-bc5f-b373f2ea49c4/20210115_051626/monitor-set-615bb62f.zip sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/615bb62f-621f-426b-bc5f-b373f2ea49c4/20210115_051626/sct-runner-615bb62f.zip Links: Build URL Download "Per server metrics nemesis" Grafana Screenshot Download "Overview metrics" Grafana Screenshot Shared "Per server metrics nemesis" Grafana Snapshot Shared "Overview metrics" Grafana Snapshot

error event:

07:16:16  ----- LAST ERROR EVENT -------------------------------------------------------
07:16:16  2021-01-15 04:57:10.104: (TestFrameworkEvent Severity.ERROR), source=UpgradeTest.test_rolling_upgrade (upgrade_test.UpgradeTest)() message=Traceback (most recent call last):
07:16:16  File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/upgrade_test.py", line 448, in test_rolling_upgrade
07:16:16  self.fill_and_verify_db_data('BEFORE UPGRADE', pre_fill=True)
07:16:16  File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/upgrade_test.py", line 413, in fill_and_verify_db_data
07:16:16  self.verify_db_data()
07:16:16  File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/sdcm/fill_db_data.py", line 3105, in verify_db_data
07:16:16  self.run_db_queries(session, session.default_fetch_size)
07:16:16  File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/sdcm/fill_db_data.py", line 3049, in run_db_queries
07:16:16  raise ex
07:16:16  File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/sdcm/fill_db_data.py", line 3045, in run_db_queries
07:16:16  res = session.execute(item['queries'][i])
07:16:16  File "/jenkins/slave/workspace/scylla-master/rolling-upgrade/rolling-upgrade-centos7-test/scylla-cluster-tests/sdcm/utils/common.py", line 1175, in execute_verbose
07:16:16  return execute_orig(*args, **kwargs)
07:16:16  File "cassandra/cluster.py", line 2611, in cassandra.cluster.Session.execute
07:16:16  File "cassandra/cluster.py", line 4829, in cassandra.cluster.ResponseFuture.result
07:16:16  cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid null value for partition key part k"
juliayakovlev commented 3 years ago

All rolling upgrade tests failed with error: "Invalid null value for partition key part k", that should be fixed by https://github.com/scylladb/scylla/pull/7804

But the rolling upgrade test starts from the cluster with 4.2.3 Scylla version. Issue #7804 is not backported to 4.2.

Maybe we need to comment this query and doesn't run it, meanwhile at least

roydahan commented 3 years ago

I commented on the mail from last week what to do. @juliayakovlev please send a PR to disable this query and open a task to remind us to renable it.

juliayakovlev commented 3 years ago

I commented on the mail from last week what to do. @juliayakovlev please send a PR to disable this query and open a task to remind us to renable it.

Yes, I did it yesterday. PR is merged: https://github.com/scylladb/scylla-cluster-tests/pull/3145 Task: https://trello.com/c/n5KsiKSA/2916-rolling-upgrade-test-run-query-with-filter-by-null-on-45-version

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 2 years with no activity. Remove stale label or comment or this will be closed in 2 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 2 days with no activity.