Open vponomaryov opened 1 year ago
@dkropachev any chance you'll have time to help with this?
The issue was successfully workarounded by using more powerful instance type for loader here: https://jenkins.scylladb.com/job/scylla-5.2/job/longevity/job/longevity-large-partition-4days-arm-test/3
So, this one occurs only on close-to-overload loader nodes. Hence, this issue is low-priority.
It is general golang issue, not specific to scylla-bench
.
It looks like there is race condition between gargage collector stack strinking/growing and parking on the channel, which has high probability under high load, will try to figure out on golang repo.
to mitigate this issue we can reduce channel using, in particular we can make histogram atomic, which is pretty easy and make all workloads submit stats to the one histogram.
Issue description
Describe your issue in detail and steps it took to produce it.
Impact
Stress process exists not providing a user with a clear enough picture of
what went wrong
.How frequently does it reproduce?
It is first time.
Installation details
Kernel Version: 5.15.0-1030-aws Scylla version (or git commit hash):
5.2.0~rc2-20230228.908a82bea064
with build-id2d8e1ab089ec69c36323037d66b1a72accfae399
Cluster size: 4 nodes (is4gen.4xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-074d26a74b8f73dba
(aws: eu-west-1)Test:
longevity-large-partition-4days-arm-test
Test id:c3260702-5b50-4389-8303-7464c8d5e384
Test name:scylla-5.2/longevity/longevity-large-partition-4days-arm-test
Test config file(s):Details:
It had 3 loaders. Pre-load finished without errors. Then, the main read stress commands failed on 2 loaders from 3. One of the loader failures is the same as in the another existing open bug (https://github.com/scylladb/scylla-bench/issues/107) and second one failed after 43m of running with following errror:
To view full error, open the
loader-2
logs in theloader-set-c3260702.tar.gz
archive.$ hydra investigate show-monitor c3260702-5b50-4389-8303-7464c8d5e384
$ hydra investigate show-logs c3260702-5b50-4389-8303-7464c8d5e384
Logs:
Jenkins job URL