scylladb / gemini

Test data integrity by comparing against an Oracle running in parallel
Apache License 2.0
31 stars 17 forks source link

The `result channel` doesn't get closed after load ends blocking the results file generation #385

Open vponomaryov opened 1 year ago

vponomaryov commented 1 year ago

Issue description

Running following gemini command:

gemini -d --duration 3h --warmup 30m -c 50 -m mixed -f --non-interactive --cql-features normal \
    --max-mutation-retries 5 --max-mutation-retries-backoff 500ms --async-objects-stabilization-attempts 5 \
    --async-objects-stabilization-backoff 500ms --replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '3'}" \
    --oracle-replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '1'}" \
    --test-cluster=10.12.1.102,10.12.2.40,10.12.2.200 --outfile /gemini/gemini_result_dd524c59-4d74-41f7-a8bd-ada4d99c9e23.log \
    --seed 70 --request-timeout 180s --connect-timeout 120s --oracle-cluster=10.12.3.145

Resulted in the following:

{"L":"INFO","T":"2023-07-10T16:29:32.918Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2023-07-10T19:59:32.884Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2023-07-10T19:59:32.919Z","M":"Test run completed. Exiting."}

But in normal case it looks like following:

{"L":"INFO","T":"2023-06-26T09:29:11.605Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2023-06-26T12:59:11.579Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2023-06-26T12:59:11.608Z","M":"result channel closed"}
{"L":"INFO","T":"2023-06-26T12:59:11.609Z","M":"Test run completed. Exiting."}

So, the result channel closed message is absent in the current test run failure. It blocked the generation of the results file.

Impact

Results file from the gemini cannot be taken.

How frequently does it reproduce?

Observed first time

Installation details

Kernel Version: 5.15.0-1039-aws Scylla version (or git commit hash): 5.2.4-20230708.9f79c9f41d6e with build-id edaa90c2e7660d794d2d308e93c1ba956e829d7d Gemini version: 1.7.8

Cluster size: 3 nodes (i3.large)

Scylla Nodes used in this run:

OS / Image: ami-0a69901a7e05f1029 (aws: us-east-1)

Test: gemini-3h-with-nemesis-test Test id: e576d8f7-262b-4864-b411-4e5c65631b55 Test name: scylla-5.2/gemini-/gemini-3h-with-nemesis-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor e576d8f7-262b-4864-b411-4e5c65631b55` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=e576d8f7-262b-4864-b411-4e5c65631b55) - Show all stored logs command: `$ hydra investigate show-logs e576d8f7-262b-4864-b411-4e5c65631b55` ## Logs: - **db-cluster-e576d8f7.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/db-cluster-e576d8f7.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/db-cluster-e576d8f7.tar.gz) - **sct-runner-events-e576d8f7.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/sct-runner-events-e576d8f7.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/sct-runner-events-e576d8f7.tar.gz) - **sct-e576d8f7.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/sct-e576d8f7.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/sct-e576d8f7.log.tar.gz) - **monitor-set-e576d8f7.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/monitor-set-e576d8f7.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/monitor-set-e576d8f7.tar.gz) - **loader-set-e576d8f7.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/loader-set-e576d8f7.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/loader-set-e576d8f7.tar.gz) - **parallel-timelines-report-e576d8f7.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/parallel-timelines-report-e576d8f7.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/e576d8f7-262b-4864-b411-4e5c65631b55/20230710_205037/parallel-timelines-report-e576d8f7.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-5.2/job/gemini-/job/gemini-3h-with-nemesis-test/22/) [Argus](https://argus.scylladb.com/test/45667235-65ee-40fc-841b-59c1d5299e02/runs?additionalRuns[]=e576d8f7-262b-4864-b411-4e5c65631b55)
fruch commented 1 year ago

Which version of Gemini is used ?

We didn't yet backport all of the recent fixes to 5.2 (it's still not fully stable on SCT master)

vponomaryov commented 1 year ago

Which version of Gemini is used ?

The version is in the bug description: 1.7.8

We didn't yet backport all of the recent fixes to 5.2 (it's still not fully stable on SCT master)

I haven't seen similar bugreport so I filed it.