scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.45k stars 1.27k forks source link

artifacts-docker-test failed in `check_scylla_version_in_housekeepingdb` #19875

Closed Annamikhlin closed 2 months ago

Annamikhlin commented 2 months ago

Issue description

artifacts-docker-test failed in check_scylla_version_in_housekeepingdb with ERROR:

01:20:00  ----- LAST ERROR EVENT -------------------------------------------------------
01:20:00  2024-07-23 22:19:49.668: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=af6b5821-fbdf-4c7b-90b7-0b42c8112a07, source=ArtifactsTest.test_scylla_service (artifacts_test.ArtifactsTest) [check Scylla server after stop/start]() message=Traceback (most recent call last):
01:20:00  File "/tmp/jenkins/workspace/scylla-master/artifacts/artifacts-docker-test/scylla-cluster-tests/artifacts_test.py", line 430, in test_scylla_service
01:20:00  version_id_after_stop = self.check_scylla_version_in_housekeepingdb(
01:20:00  File "/tmp/jenkins/workspace/scylla-master/artifacts/artifacts-docker-test/scylla-cluster-tests/sdcm/utils/decorators.py", line 70, in inner
01:20:00  return func(*args, **kwargs)
01:20:00  File "/tmp/jenkins/workspace/scylla-master/artifacts/artifacts-docker-test/scylla-cluster-tests/artifacts_test.py", line 81, in check_scylla_version_in_housekeepingdb
01:20:00  assert public_ip_address == row[2], (
01:20:00  TypeError: 'NoneType' object is not subscriptable
01:20:00  ----- LAST WARNING EVENT -----------------------------------------------------
01:20:00  2024-07-23 22:18:46.823: (CommitLogCheckErrorEvent Severity.WARNING) period_type=one-time event_id=43ac8bb1-89cd-441b-8e9a-de6a0e8f738e: message=CommitLogCheckThread will not start due to no monitors in the cluster
01:20:00  ----- LAST NORMAL EVENT ------------------------------------------------------
01:20:00  2024-07-23 22:19:47.658: (InfoEvent Severity.NORMAL) period_type=not-set event_id=1d69bb1e-da36-4f6d-915b-9eabe90cda5f: message=TEST_END
01:20:00  ================================================================================

failed job on master - https://jenkins.scylladb.com/job/scylla-master/job/artifacts/job/artifacts-docker-test/977/

failed jobs on branch-6.1 -

How frequently does it reproduce?

not constant - reproduced on master and twice on 6.1

Installation details

Cluster size: 1 nodes (docker)

Scylla Nodes used in this run:

OS / Image: scylladb/scylla-nightly (docker: undefined_region)

Test: artifacts-docker-test Test id: 1b14a6dc-dc2c-4275-b573-af9b460ed23d Test name: scylla-master/artifacts/artifacts-docker-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor 1b14a6dc-dc2c-4275-b573-af9b460ed23d` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=1b14a6dc-dc2c-4275-b573-af9b460ed23d) - Show all stored logs command: `$ hydra investigate show-logs 1b14a6dc-dc2c-4275-b573-af9b460ed23d` ## Logs: - **db-cluster-1b14a6dc.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1b14a6dc-dc2c-4275-b573-af9b460ed23d/20240723_222006/db-cluster-1b14a6dc.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1b14a6dc-dc2c-4275-b573-af9b460ed23d/20240723_222006/db-cluster-1b14a6dc.tar.gz) - **sct-runner-events-1b14a6dc.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1b14a6dc-dc2c-4275-b573-af9b460ed23d/20240723_222006/sct-runner-events-1b14a6dc.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1b14a6dc-dc2c-4275-b573-af9b460ed23d/20240723_222006/sct-runner-events-1b14a6dc.tar.gz) - **sct-1b14a6dc.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1b14a6dc-dc2c-4275-b573-af9b460ed23d/20240723_222006/sct-1b14a6dc.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1b14a6dc-dc2c-4275-b573-af9b460ed23d/20240723_222006/sct-1b14a6dc.log.tar.gz) - **monitor-set-1b14a6dc.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1b14a6dc-dc2c-4275-b573-af9b460ed23d/20240723_222006/monitor-set-1b14a6dc.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1b14a6dc-dc2c-4275-b573-af9b460ed23d/20240723_222006/monitor-set-1b14a6dc.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-master/job/artifacts/job/artifacts-docker-test/977/) [Argus](https://argus.scylladb.com/test/271ed0dd-d85d-485e-8c7d-ba07f279403a/runs?additionalRuns[]=1b14a6dc-dc2c-4275-b573-af9b460ed23d)
mykaul commented 2 months ago

CC @roydahan - apparently, this is not fixed yet.

roydahan commented 2 months ago

It's not consistently failing, I asked @Annamikhlin to open it so we can fix it instead of just discussing if to stop testing scylla_housekeeping.

roydahan commented 2 months ago

in housekeeping:

Housekeeping DB saved info, query 'SELECT id, version, ip, statuscode FROM housekeeping.checkversion WHERE uuid = %s': [(59731041, '6.2.0~dev', '3.253.82.105', 'cr')]
< t:2024-07-23 22:19:45,506 f:artifacts_test.py l:75   c:ArtifactsTest        p:DEBUG > Last row in housekeeping.checkversion for uuid '79002112-4941-11ef-89ba-0242ac110002': (59731041, '6.2.0~dev', '3.253.82.105', 'cr')
< t:2024-07-23 22:19:45,506 f:artifacts_test.py l:78   c:ArtifactsTest        p:DEBUG > public_ip_address = 3.253.82.105
roydahan commented 2 months ago

Assert code that fails:

# Validate public IP address
        assert public_ip_address == row[2], (
            f"Wrong IP address is saved in '{self.CHECK_VERSION_TABLE}' table: "
            f"expected {self.node.public_ip_address}, got: {row[2]}")
fruch commented 2 months ago

there a retry there, for those assertions.

problem is it failed on TypeError and it wasn't part of the retry....