Closed fgelcer closed 12 months ago
reported https://github.com/scylladb/scylladb/issues/14474 to try to understand why we encountered this error in the oracle's logs
It failed due to the following reason:
{"L":"INFO","T":"2023-07-02T04:28:06.218Z","N":"work cycle.validation_job","M":"Retring failed validation stoped by reach of max attempts 5. Error: unable to load check data from the oracle store: system failed: gocql: no response received from cassandra within timeout period"}
Error detected: &joberror.JobError{Timestamp:time.Date(2023, time.July, 2, 4, 28, 6, 218489658, time.Local), Message:"Validation failed: unable to load check data from the oracle store: system failed: gocql: no response received from cassandra within timeout period", Query:"SELECT * FROM ks1.table1 WHERE col4=893580704621996895 AND col6=36 ALLOW FILTERING "}{"L":"INFO","T":"2023-07-02T04:28:06.218Z","N":"work cycle.validation_job","M":"ending validation loop"}
So basically request failed 5 times and gemini failed, which is expected behavior
@dkropachev , could you improve the log message in this case, by:
ERROR
?
Issue description
Looking at the Argus run, we can see that Gemini has exited before the end of the test:
in this time frame, i can see in the logs 2 error messages, that are probably the reason for the Gemini to exit, although the logs of Scylla continued:
Impact
Gemini exited, and later, failed to
verify_results()
, and test ended with errorHow frequently does it reproduce?
i'm re-running the same job, using the same Gemini seed to confirm what happens, and will update
Installation details
Kernel Version: 5.15.0-1039-aws Scylla version (or git commit hash):
5.4.0~dev-20230629.f6f974cdeb11
with build-id7afc85749bdc68e7ee32eead35d51badd480c79f
Cluster size: 3 nodes (i3.large)
Scylla Nodes used in this run:
OS / Image: `` (aws: undefined_region)
Test:
gemini-3h-with-nemesis-test
Test id:3211fb00-4a70-4d4f-9232-1df186f67bf2
Test name:scylla-master/gemini-/gemini-3h-with-nemesis-test
Test config file(s):Logs and commands
- Restore Monitor Stack command: `$ hydra investigate show-monitor 3211fb00-4a70-4d4f-9232-1df186f67bf2` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=3211fb00-4a70-4d4f-9232-1df186f67bf2) - Show all stored logs command: `$ hydra investigate show-logs 3211fb00-4a70-4d4f-9232-1df186f67bf2` ## Logs: - **db-cluster-3211fb00.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/db-cluster-3211fb00.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/db-cluster-3211fb00.tar.gz) - **sct-runner-events-3211fb00.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/sct-runner-events-3211fb00.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/sct-runner-events-3211fb00.tar.gz) - **sct-3211fb00.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/sct-3211fb00.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/sct-3211fb00.log.tar.gz) - **monitor-set-3211fb00.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/monitor-set-3211fb00.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/monitor-set-3211fb00.tar.gz) - **loader-set-3211fb00.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/loader-set-3211fb00.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/loader-set-3211fb00.tar.gz) - **parallel-timelines-report-3211fb00.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/parallel-timelines-report-3211fb00.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/3211fb00-4a70-4d4f-9232-1df186f67bf2/20230702_045901/parallel-timelines-report-3211fb00.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-master/job/gemini-/job/gemini-3h-with-nemesis-test/385/) [Argus](https://argus.scylladb.com/test/c05bf635-e29a-490c-a72f-9d4e42e6c7e7/runs?additionalRuns[]=3211fb00-4a70-4d4f-9232-1df186f67bf2)