grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.17k stars 535 forks source link

Bug: mimir distributed smoke test job failing when nginx is disabled #9996

Open radekg opened 3 days ago

radekg commented 3 days ago

What is the bug?

smoke test job is failing when nginx is disabled

Logs:

ts=2024-11-22T22:05:30.226594169Z caller=write_read_series.go:407 level=warn test=write-read-series query=sum(max_over_time(mimir_continuous_test_sine_wave_v2[1s])) start=2024-11-21T22:05:40Z end=2024-11-22T22:05:20Z step=20s msg="Failed to execute range query used to find previously written samples" query=sum(max_over_time(mimir_continuous_test_sine_wave_v2[1s])) err="Post \"http://mimir-system-nginx.core-monitoring-metrics.svc:80/prometheus/api/v1/query_range\": dial tcp: lookup mimir-system-nginx.core-monitoring-metrics.svc on 10.10.0.10:53: no such host"

which makes sense when nginx is not there.

Chart: mimir-distributed, version: 5.5.1.

How to reproduce it?

Configure the Helm chart with nginx diabled.

What did you think would happen?

If the smoke test job depends on nginx, if nginx is disabled, smoke test job should not be deployed.

What was your environment?

Kubernetes 1.31, chart: mimir-distributed, version 5.5.1 with app version 2.14.0.

Any additional context to share?

There should probably be a check here to deploy the smoke test job when nginx is disabled: https://github.com/grafana/mimir/blob/main/operations/helm/charts/mimir-distributed/templates/smoke-test/smoke-test-job.yaml#L1.