Closed KnifeyMoloko closed 1 year ago
seeing it on 5.0:
Kernel Version: 5.13.0-1029-aws
Scylla version (or git commit hash): 5.0~rc8-20220612.f28542a71
with build-id 85cf87619b93155a574647ec252ce5a043c7fe77
Cluster size: 4 nodes (i3.2xlarge)
Scylla Nodes used in this run:
OS / Image: ami-06d93b63348d73505
(aws: us-east-1)
Test: longevity-lwt-3h-test
Test id: 9a717cba-379e-4b3a-9ce2-2c26254c08cc
Test name: scylla-5.0/longevity/longevity-lwt-3h-test
Test config file(s):
>>>>>>> Your description here... <<<<<<<
$ hydra investigate show-monitor 9a717cba-379e-4b3a-9ce2-2c26254c08cc
$ hydra investigate show-logs 9a717cba-379e-4b3a-9ce2-2c26254c08cc
@fgelcer We don't see this issue come up very often (twice in a year). Extending the timeout from 120s to 180s will make it less likely to fail or we can ignore it, since it's such a rare occurence. WDYT?
if that is not frequently occurring, we should skip it, and if encounter again, report a new issue for it. increasing the timeout may hide issues from us, in a sense that most of the runs are passing within 120s
Installation details Kernel version:
5.11.0-1028-aws
Scylla version (or git commit hash):5.1.dev-0.20220217.69fcc053b with build-id b8415b1ebbffff2b4183734680f4afab3bfed86d
Cluster size: 6 nodes (i3.4xlarge) Scylla running with shards number (live nodes): longevity-tls-50gb-3d-master-db-node-2a70ca0f-1 (3.250.23.34 | 10.0.2.108): 14 shards longevity-tls-50gb-3d-master-db-node-2a70ca0f-9 (34.243.236.177 | 10.0.1.205): 14 shards longevity-tls-50gb-3d-master-db-node-2a70ca0f-14 (3.250.223.137 | 10.0.1.30): 14 shards longevity-tls-50gb-3d-master-db-node-2a70ca0f-17 (34.243.248.150 | 10.0.1.26): 14 shards longevity-tls-50gb-3d-master-db-node-2a70ca0f-21 (34.251.180.151 | 10.0.1.159): 14 shards longevity-tls-50gb-3d-master-db-node-2a70ca0f-22 (34.253.70.67 | 10.0.3.42): 14 shards longevity-tls-50gb-3d-master-db-node-2a70ca0f-23 (18.203.254.31 | 10.0.1.198): 14 shards longevity-tls-50gb-3d-master-db-node-2a70ca0f-24 (63.35.203.93 | 10.0.2.113): 14 shards Scylla running with shards number (terminated nodes): longevity-tls-50gb-3d-master-db-node-2a70ca0f-3 (54.170.189.138 | 10.0.0.69): 14 shardsOS (RHEL/CentOS/Ubuntu/AWS AMI):
ami-041d8500e7cf30167
(aws: eu-west-1)Test:
longevity-50gb-3days
Test name:longevity_test.LongevityTest.test_custom_time
Test config file(s):Issue description
==================================== The AbortRepairMonkey nemesis failed due to hitting a timeout for the
silenced_nodetool_repair_to_fail()
method to finish. Looks like therepair
command was completed tough.====================================
Restore Monitor Stack command:
$ hydra investigate show-monitor 2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e
Restore monitor on AWS instance using Jenkins job Show all stored logs command:$ hydra investigate show-logs 2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e
Test id:
2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e
Logs: grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_061344/grafana-screenshot-longevity-50gb-3days-scylla-per-server-metrics-nemesis-20220221_061638-longevity-tls-50gb-3d-master-monitor-node-2a70ca0f-1.png](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_061344/grafana-screenshot-longevity-50gb-3days-scylla-per-server-metrics-nemesis-20220221_061638-longevity-tls-50gb-3d-master-monitor-node-2a70ca0f-1.png) grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_061344/grafana-screenshot-overview-20220221_061344-longevity-tls-50gb-3d-master-monitor-node-2a70ca0f-1.png](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_061344/grafana-screenshot-overview-20220221_061344-longevity-tls-50gb-3d-master-monitor-node-2a70ca0f-1.png) critical - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/critical-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/critical-2a70ca0f.log.tar.gz) db-cluster - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/db-cluster-2a70ca0f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/db-cluster-2a70ca0f.tar.gz) debug - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/debug-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/debug-2a70ca0f.log.tar.gz) email_data - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/email_data-2a70ca0f.json.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/email_data-2a70ca0f.json.tar.gz) error - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/error-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/error-2a70ca0f.log.tar.gz) event - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/events-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/events-2a70ca0f.log.tar.gz) left_processes - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/left_processes-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/left_processes-2a70ca0f.log.tar.gz) loader-set - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/loader-set-2a70ca0f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/loader-set-2a70ca0f.tar.gz) monitor-set - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/monitor-set-2a70ca0f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/monitor-set-2a70ca0f.tar.gz) normal - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/normal-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/normal-2a70ca0f.log.tar.gz) output - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/output-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/output-2a70ca0f.log.tar.gz) event - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/raw_events-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/raw_events-2a70ca0f.log.tar.gz) sct - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/sct-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/sct-2a70ca0f.log.tar.gz) summary - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/summary-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/summary-2a70ca0f.log.tar.gz) warning - [https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/warning-2a70ca0f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/2a70ca0f-b6da-4b25-a7c3-d3ce9923a65e/20220221_063735/warning-2a70ca0f.log.tar.gz)
Jenkins job URL