Open vponomaryov opened 1 day ago
What shows Argus:
The same as text:
+--------------------------------------+-------------------------------------------------------+ | 1. disrupt_truncate_large_partition | long-custom-d2-wrkld1-5dc-dev-db-node-454bea32-11 | | Status: Succeeded | Start: 2024-10-28 23:16:37 | End: 2024-10-28 23:22:24 | +--------------------------------------+-------------------------------------------------------+ | 2. disrupt_stop_start_scylla_server | long-custom-d2-wrkld1-5dc-dev-db-node-454bea32-49 | | Status: Succeeded | Start: 2024-10-28 21:35:58 | End: 2024-10-28 21:36:48 | +--------------------------------------+------------------------------------------------------+
Real time values according to the jenkins logs are following:
2024-10-28 17:29:01,408 f:sct_config.py l:2057 c:sdcm.sct_config p:INFO > n_db_nodes: 12 12 12 12 12 2024-10-28 17:29:01,408 f:sct_config.py l:2057 c:sdcm.sct_config p:INFO > n_loaders: 5 1 1 1 1 ... 2024-10-28 21:02:28,434 f:tester.py l:750 c:LongevityTest p:INFO > Test start time 2024-10-28 21:02:28, duration is 539 and timeout set to 2024-10-29 06:01:28 ... 2024-10-28 21:43:31,370 f:loader_utils.py l:290 c:LongevityTest p:INFO > Prepare finished ... ... 2024-10-28 21:10:22,686 f:nemesis.py l:1897 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: >>>>>>>>>>>>>Started random_disrupt_method stop_start_scylla_server ... 2024-10-28 22:23:50,978 f:nemesis.py l:1901 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: <<<<<<<<<<<<<Finished random_disrupt_method stop_start_scylla_server ... ... 2024-10-28 22:28:50,979 f:nemesis.py l:1897 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: >>>>>>>>>>>>>Started random_disrupt_method truncate_large_partition ... 2024-10-29 00:10:19,140 f:nemesis.py l:1901 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: <<<<<<<<<<<<<Finished random_disrupt_method truncate_large_partition ... ... 2024-10-29 00:15:19,142 f:nemesis.py l:1897 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: >>>>>>>>>>>>>Started random_disrupt_method bootstrap_streaming_error ... ... 2024-10-29 00:47:22,216 f:tester.py l:2959 c:LongevityTest p:INFO > TearDown is starting... ... 2024-10-29 00:51:07,179 f:nemesis.py l:452 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: Report 2024-10-29 00:51:07,179 f:nemesis.py l:453 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: DB Version: 6.3.0~dev 2024-10-29 00:51:07,179 f:nemesis.py l:454 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: Interval: 300 s 2024-10-29 00:51:07,179 f:nemesis.py l:455 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: Average duration: 186.5 s 2024-10-29 00:51:07,179 f:nemesis.py l:456 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: Total execution time: 13668 s 2024-10-29 00:51:07,179 f:nemesis.py l:457 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: Times executed: 2 2024-10-29 00:51:07,179 f:nemesis.py l:458 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: Unexpected errors: 0 2024-10-29 00:51:07,179 f:nemesis.py l:459 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: Operation log: 2024-10-29 00:51:07,179 f:nemesis.py l:461 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: {\ 'operation': 'StopStartScyllaServer', 'start': 1730151358, 'end': 1730151386, 'duration': 27, 'node': 'Node long-custom-d2-wrkld1-5dc-dev-db-node-454bea32-49 [44.208.35.41 | 10.12.7.49] (dc name: us-east-1)', 'subtype': 'end'} 2024-10-29 00:51:07,179 f:nemesis.py l:461 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: {\ 'operation': 'TruncateLargePartition', 'start': 1730157397, 'end': 1730157743, 'duration': 346, 'node': 'Node long-custom-d2-wrkld1-5dc-dev-db-node-454bea32-11 [3.76.36.146 | 10.11.6.209] (dc name: eu-central-1)', 'subtype': 'end'} 2024-10-29 00:51:07,179 f:cluster.py l:4521 c:sdcm.cluster p:INFO > Cluster long-custom-d2-wrkld1-5dc-dev-db-cluster-454bea32 (\ AMI: [ 'ami-0cbeb1d253aae6204', 'ami-07ba85ecd266cdea7', 'ami-0285a208b33f03cb9', 'ami-0a8cc93a1aef18f7e', 'ami-030a158b040cdac2f' ] Type: i3en.large): Set _nemesis_termination_event ... 2024-10-29 01:17:45,819 f:tester.py l:3047 c:LongevityTest p:INFO > ================================= TEST RESULTS ================================= 2024-10-29 01:17:45,819 f:tester.py l:3047 c:LongevityTest p:INFO > ================================================================================ 2024-10-29 01:17:45,819 f:tester.py l:3047 c:LongevityTest p:INFO > SUCCESS :)
Most probably it is somehow related to the fact that the setup is 5 dcs with 12 DB nodes per each - 60 DB nodes in summary.
IIRC it's because of health check happens between Started-Finished log lines, and nemesis time sent to Argus is calculated out of disruption method only. In big clusters healthcheck takes longer.
What shows Argus:
The same as text:
Real time values according to the jenkins logs are following:
Most probably it is somehow related to the fact that the setup is 5 dcs with 12 DB nodes per each - 60 DB nodes in summary.