vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.56k stars 1.54k forks source link

chore(tests): Parallelize the adaptive concurrency tests #21343

Closed bruceg closed 3 days ago

datadog-vectordotdev[bot] commented 4 days ago

Datadog Report

Branch report: bruceg/improve-adaptive-concurrency-tests Commit report: 03ebde2 Test service: vector

:white_check_mark: 0 Failed, 444 Passed, 0 Skipped, 4m 14.59s Total Time

github-actions[bot] commented 3 days ago

Regression Detector Results

Run ID: f6367cf5-cbb8-4a44-b5b0-4def5b0c7c6f Metrics dashboard

Baseline: f99e052b54fc9c32731694f258b30360e28b68ac Comparison: 6e47077efb9ea78e757383da0e49db08f5378212

Performance changes are noted in the perf column of each table:

No significant changes in experiment optimization goals

Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing `erratic: true` are ignored. | perf | experiment | goal | Δ mean % | Δ mean % CI | links | |------|-------------------|-------------------|----------|-----------------|-------| | ➖ | file_to_blackhole | egress throughput | -3.33 | [-10.29, +3.62] | |

Fine details of change detection per experiment

| perf | experiment | goal | Δ mean % | Δ mean % CI | links | |------|---------------------------------------------------|--------------------|----------|-----------------|-------| | ➖ | http_text_to_http_json | ingress throughput | +1.15 | [+1.04, +1.26] | | | ➖ | otlp_grpc_to_blackhole | ingress throughput | +1.13 | [+1.01, +1.26] | | | ➖ | syslog_splunk_hec_logs | ingress throughput | +0.81 | [+0.72, +0.90] | | | ➖ | datadog_agent_remap_blackhole | ingress throughput | +0.65 | [+0.56, +0.75] | | | ➖ | http_elasticsearch | ingress throughput | +0.58 | [+0.43, +0.72] | | | ➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | +0.41 | [+0.23, +0.58] | | | ➖ | datadog_agent_remap_blackhole_acks | ingress throughput | +0.22 | [+0.12, +0.33] | | | ➖ | syslog_humio_logs | ingress throughput | +0.21 | [+0.12, +0.30] | | | ➖ | http_to_http_noack | ingress throughput | +0.10 | [+0.03, +0.17] | | | ➖ | http_to_http_json | ingress throughput | +0.02 | [-0.02, +0.05] | | | ➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | +0.01 | [-0.08, +0.10] | | | ➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | +0.01 | [-0.10, +0.12] | | | ➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | -0.01 | [-0.10, +0.08] | | | ➖ | http_to_http_acks | ingress throughput | -0.02 | [-1.24, +1.20] | | | ➖ | syslog_log2metric_humio_metrics | ingress throughput | -0.10 | [-0.22, +0.03] | | | ➖ | http_to_s3 | ingress throughput | -0.21 | [-0.48, +0.06] | | | ➖ | otlp_http_to_blackhole | ingress throughput | -0.29 | [-0.42, -0.17] | | | ➖ | socket_to_socket_blackhole | ingress throughput | -0.52 | [-0.58, -0.45] | | | ➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -0.71 | [-0.78, -0.63] | | | ➖ | syslog_loki | ingress throughput | -0.82 | [-0.91, -0.73] | | | ➖ | fluent_elasticsearch | ingress throughput | -0.99 | [-1.48, -0.50] | | | ➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | -1.33 | [-1.43, -1.23] | | | ➖ | datadog_agent_remap_datadog_logs | ingress throughput | -1.86 | [-2.05, -1.67] | | | ➖ | splunk_hec_route_s3 | ingress throughput | -1.90 | [-2.20, -1.60] | | | ➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | -1.95 | [-2.07, -1.83] | | | ➖ | file_to_blackhole | egress throughput | -3.33 | [-10.29, +3.62] | |

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI". For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true: 1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look. 2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that *if our statistical model is accurate*, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants. 3. Its configuration does not mark it "erratic".
github-actions[bot] commented 3 days ago

Regression Detector Results

Run ID: c1916fd8-5c4c-47d9-b335-a81d6e25ace6 Metrics dashboard

Baseline: ca0fa057eaa128beb7777428f79cec9924f1d396 Comparison: e17273230a206d5afd23decfba329beb74bfb1c9

Performance changes are noted in the perf column of each table:

Significant changes in experiment optimization goals

Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%

perf experiment goal Δ mean % Δ mean % CI links
splunk_hec_route_s3 ingress throughput +5.13 [+4.81, +5.45]

Experiments ignored for regressions

Regressions in experiments with settings containing `erratic: true` are ignored. | perf | experiment | goal | Δ mean % | Δ mean % CI | links | |------|-------------------|-------------------|----------|------------------|-------| | ❌ | file_to_blackhole | egress throughput | -16.52 | [-22.86, -10.18] | |

Fine details of change detection per experiment

| perf | experiment | goal | Δ mean % | Δ mean % CI | links | |------|---------------------------------------------------|--------------------|----------|------------------|-------| | ✅ | splunk_hec_route_s3 | ingress throughput | +5.13 | [+4.81, +5.45] | | | ➖ | http_text_to_http_json | ingress throughput | +2.69 | [+2.59, +2.79] | | | ➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | +1.35 | [+1.14, +1.56] | | | ➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | +1.01 | [+0.88, +1.14] | | | ➖ | datadog_agent_remap_blackhole_acks | ingress throughput | +0.76 | [+0.66, +0.87] | | | ➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.40 | [+0.22, +0.58] | | | ➖ | syslog_log2metric_humio_metrics | ingress throughput | +0.22 | [+0.08, +0.36] | | | ➖ | http_to_http_noack | ingress throughput | +0.20 | [+0.10, +0.29] | | | ➖ | syslog_splunk_hec_logs | ingress throughput | +0.17 | [+0.06, +0.29] | | | ➖ | syslog_loki | ingress throughput | +0.07 | [-0.03, +0.16] | | | ➖ | otlp_grpc_to_blackhole | ingress throughput | +0.03 | [-0.08, +0.14] | | | ➖ | http_to_http_json | ingress throughput | +0.03 | [-0.01, +0.06] | | | ➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | +0.02 | [-0.08, +0.11] | | | ➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.01 | [-0.12, +0.10] | | | ➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | -0.01 | [-0.09, +0.07] | | | ➖ | otlp_http_to_blackhole | ingress throughput | -0.16 | [-0.27, -0.06] | | | ➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | -0.18 | [-0.27, -0.08] | | | ➖ | http_to_http_acks | ingress throughput | -0.29 | [-1.51, +0.92] | | | ➖ | http_to_s3 | ingress throughput | -0.61 | [-0.89, -0.34] | | | ➖ | datadog_agent_remap_blackhole | ingress throughput | -0.61 | [-0.71, -0.52] | | | ➖ | fluent_elasticsearch | ingress throughput | -0.62 | [-1.11, -0.14] | | | ➖ | socket_to_socket_blackhole | ingress throughput | -0.81 | [-0.88, -0.73] | | | ➖ | syslog_humio_logs | ingress throughput | -0.99 | [-1.08, -0.90] | | | ➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -1.35 | [-1.45, -1.25] | | | ➖ | http_elasticsearch | ingress throughput | -1.48 | [-1.62, -1.34] | | | ❌ | file_to_blackhole | egress throughput | -16.52 | [-22.86, -10.18] | |

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI". For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true: 1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look. 2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that *if our statistical model is accurate*, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants. 3. Its configuration does not mark it "erratic".