prometheus-operator / kube-prometheus

Use Prometheus to monitor Kubernetes and applications running on Kubernetes
https://prometheus-operator.dev/
Apache License 2.0
6.67k stars 1.92k forks source link

Rule alert not ingesting samples of type histogram #2096

Open frederiksf opened 1 year ago

frederiksf commented 1 year ago

What happened? Deployed version 0.12 with the runbooks from that version.

Did you expect to see some different? I get an alert which should not be there

https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusnotingestingsamples/

How to reproduce it (as minimally and precisely as possible): Deploy release 0.12 Deploy an app which produces histogram metrics Check the alerts for alert "prometheusnotingestingsamples"

image

The metric above is producing the alert. the same metric but with type=float has metrics.

image

Environment

prometheus-operator:v0.62.0

`Insert image tag or Git SHA here`
<!-- Try kubectl -n monitoring describe deployment prometheus-operator -->
insert manifests relevant to the issue
Insert Prometheus Operator logs relevant to the issue here
ts=2023-05-03T12:30:15.400Z caller=main.go:564 level=info msg="Starting Prometheus Server" mode=server version="(version=2.43.0, branch=HEAD, revision=edfc3bcd025dd6fe296c167a14a216cab1e552ee)"
ts=2023-05-03T12:30:15.400Z caller=main.go:569 level=info build_context="(go=go1.19.7, platform=linux/amd64, user=root@8a0ee342e522, date=20230321-12:56:07, tags=netgo,builtinassets)"
ts=2023-05-03T12:30:15.400Z caller=main.go:570 level=info host_details="(Linux 5.10.176-157.645.amzn2.x86_64 #1 SMP Tue Mar 28 17:49:06 UTC 2023 x86_64 prometheus-k8s-0 (none))"
ts=2023-05-03T12:30:15.400Z caller=main.go:571 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2023-05-03T12:30:15.400Z caller=main.go:572 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2023-05-03T12:30:15.404Z caller=web.go:561 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2023-05-03T12:30:15.405Z caller=main.go:1005 level=info msg="Starting TSDB ..."
ts=2023-05-03T12:30:15.406Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1677866400074 maxt=1678060800000 ulid=01GTTMQCER6P1Q8319VGF7DH7E
ts=2023-05-03T12:30:15.406Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1678060800071 maxt=1678255200000 ulid=01GV0E4T4HY52WMHVA2ZB386S3
ts=2023-05-03T12:30:15.407Z caller=tls_config.go:232 level=info component=web msg="Listening on" address=[::]:9090
ts=2023-05-03T12:30:15.407Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1678255200074 maxt=1678449600000 ulid=01GV67HZGW0Z93HKNQV4Q4Z453
ts=2023-05-03T12:30:15.407Z caller=tls_config.go:271 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
ts=2023-05-03T12:30:15.407Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1678449600076 maxt=1678644000000 ulid=01GVC0W98M3G9ZKH2E6MRC582Z
ts=2023-05-03T12:30:15.408Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1678644000071 maxt=1678838400000 ulid=01GVHTCXKXW4PMPPXWYQ7QSGDV
ts=2023-05-03T12:30:15.409Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1678838400323 maxt=1679032800000 ulid=01GVQKRX61HECBV6TAXDBJCQSD
ts=2023-05-03T12:30:15.410Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1679032800323 maxt=1679227200000 ulid=01GVXD4AMGBFKY93ERVFHN8V4E
ts=2023-05-03T12:30:15.410Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1679227200186 maxt=1679421600000 ulid=01GW36FCWCQH8BDJGDB7VVS4VT
ts=2023-05-03T12:30:15.411Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1679421600057 maxt=1679616000000 ulid=01GW8ZVHZWRKTVHH3SB4FRH33T
ts=2023-05-03T12:30:15.411Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1679616000186 maxt=1679810400000 ulid=01GWES7FNTWG3CF0DNA2BHBP7H
ts=2023-05-03T12:30:15.412Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1679810400012 maxt=1680004800000 ulid=01GWMJN0S8VS22VFRJQ03YPMT0
ts=2023-05-03T12:30:15.412Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1680004800012 maxt=1680199200000 ulid=01GWTC1DD393ATT74JCFMF5S7V
ts=2023-05-03T12:30:15.413Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1680199200087 maxt=1680393600000 ulid=01GX05DAR2HQBPGBCAQJ4RQJ2F
ts=2023-05-03T12:30:15.413Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1680393600087 maxt=1680588000000 ulid=01GX5YTND9G220KWKY46VMT9HC
ts=2023-05-03T12:30:15.414Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1680588000087 maxt=1680782400000 ulid=01GXBR79SF36K0M2KD0VM4QNAB
ts=2023-05-03T12:30:15.415Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1680782400057 maxt=1680976800000 ulid=01GXHHKB735Q9KV6G95B6R268N
ts=2023-05-03T12:30:15.415Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1680976800057 maxt=1681171200000 ulid=01GXQB01X60YF59JXET881DR9Y
ts=2023-05-03T12:30:15.416Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1681171200057 maxt=1681365600000 ulid=01GXX4D12MHQ3ESMCTTA2KVD1Y
ts=2023-05-03T12:30:15.416Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1681365600015 maxt=1681560000000 ulid=01GY2XS6M8ZTWCP7E3MV0N4TRN
ts=2023-05-03T12:30:15.417Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1681560000015 maxt=1681754400000 ulid=01GY8Q69447M8ZJGVYW5G6PA2V
ts=2023-05-03T12:30:15.418Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1681754400015 maxt=1681948800000 ulid=01GYEGK4WY3C7A3T8Q308DEGAP
ts=2023-05-03T12:30:15.419Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1681948800040 maxt=1682143200000 ulid=01GYM9Z58FRS14K20AHKWD0YZ4
ts=2023-05-03T12:30:15.420Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1682143200101 maxt=1682337600000 ulid=01GYT3BKXQY06NZ4BVM43422VD
ts=2023-05-03T12:30:15.420Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1682337600101 maxt=1682532000000 ulid=01GYZWS3BBFWBGMYW2HRK4MDJC
ts=2023-05-03T12:30:15.421Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1682532000499 maxt=1682726400000 ulid=01GZ5P59S3VSM6T3CWAEZEVAM8
ts=2023-05-03T12:30:15.421Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1682726400312 maxt=1682920800000 ulid=01GZBFHWCTS32P92MXP8DA12WM
ts=2023-05-03T12:30:15.422Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1682920800312 maxt=1682985600000 ulid=01GZDDAN6E6A9HRTMA5C9ZXRQR
ts=2023-05-03T12:30:15.423Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1682985600115 maxt=1683050400000 ulid=01GZFB4DAYF06K6P4BXJJS276Y
ts=2023-05-03T12:30:15.423Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1683050400050 maxt=1683072000000 ulid=01GZFZPV3703058PSCCWZZX1DT
ts=2023-05-03T12:30:15.424Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1683093600133 maxt=1683100800000 ulid=01GZGDEAR4D8H8B1BM7KEQ01VR
ts=2023-05-03T12:30:15.425Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1683100800000 maxt=1683108000000 ulid=01GZGM9R2E46J2SEAHPN814XAX
ts=2023-05-03T12:30:15.425Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1683072000133 maxt=1683093600000 ulid=01GZGMA3TQDDBX3PRQE04HSANV
ts=2023-05-03T12:30:16.870Z caller=head.go:587 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2023-05-03T12:30:17.601Z caller=head.go:658 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=731.836382ms
ts=2023-05-03T12:30:17.601Z caller=head.go:664 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2023-05-03T12:30:20.453Z caller=head.go:700 level=info component=tsdb msg="WAL checkpoint loaded"
ts=2023-05-03T12:30:21.933Z caller=head.go:735 level=info component=tsdb msg="WAL segment loaded" segment=67544 maxSegment=67550
ts=2023-05-03T12:30:24.429Z caller=head.go:735 level=info component=tsdb msg="WAL segment loaded" segment=67545 maxSegment=67550
ts=2023-05-03T12:30:25.187Z caller=head.go:735 level=info component=tsdb msg="WAL segment loaded" segment=67546 maxSegment=67550
ts=2023-05-03T12:30:27.476Z caller=head.go:735 level=info component=tsdb msg="WAL segment loaded" segment=67547 maxSegment=67550
ts=2023-05-03T12:30:29.720Z caller=head.go:735 level=info component=tsdb msg="WAL segment loaded" segment=67548 maxSegment=67550
ts=2023-05-03T12:30:30.457Z caller=head.go:735 level=info component=tsdb msg="WAL segment loaded" segment=67549 maxSegment=67550
ts=2023-05-03T12:30:30.457Z caller=head.go:735 level=info component=tsdb msg="WAL segment loaded" segment=67550 maxSegment=67550
ts=2023-05-03T12:30:30.458Z caller=head.go:772 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=2.851980545s wal_replay_duration=10.004035955s wbl_replay_duration=160ns total_replay_duration=13.587945583s
ts=2023-05-03T12:30:31.400Z caller=main.go:1026 level=info fs_type=EXT4_SUPER_MAGIC
ts=2023-05-03T12:30:31.400Z caller=main.go:1029 level=info msg="TSDB started"
ts=2023-05-03T12:30:31.400Z caller=main.go:1209 level=info msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
ts=2023-05-03T12:30:31.419Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/kube-state-metrics/1 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.419Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/rigger-acceptatie/rigger-backend-servicemonitor/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.420Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/velero/velero/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.420Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/easyexchange/easyexchange-backend-servicemonitor/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.420Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/harbor2/harbor-servicemonitor/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.420Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/kubelet/1 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.421Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/rigger-test/rigger-backend-servicemonitor/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.421Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/gitlab-monitor/gitlab-pipeline-monitor/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.421Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/test/prometheus-histogram-test/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.422Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/kube-apiserver/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.422Z caller=kubernetes.go:326 level=info component="discovery manager notify" discovery=kubernetes config=config-0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.535Z caller=main.go:1246 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=135.107986ms db_storage=1.88µs remote_storage=1.63µs web_handler=580ns query_engine=1.13µs scrape=296.904µs scrape_sd=3.3256ms notify=22.021µs notify_sd=312.374µs rules=112.604988ms tracing=7.4µs
ts=2023-05-03T12:30:31.535Z caller=main.go:990 level=info msg="Server is ready to receive web requests."
ts=2023-05-03T12:30:31.535Z caller=manager.go:974 level=info component="rule manager" msg="Starting rule manager..."
ts=2023-05-03T12:30:31.535Z caller=main.go:1209 level=info msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
ts=2023-05-03T12:30:31.557Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/rigger-test/rigger-queries-servicemonitor/1 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.557Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/kube-system/fluent-bit/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.557Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/alertmanager/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.558Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/rigger-acceptatie/rigger-rabbitmq-servicemonitor/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.558Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/gitlab-monitor/gitlab-pipeline-monitor/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.558Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/easyexchange/easyexchange-backend-servicemonitor/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.559Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/velero/velero/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.559Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/test/prometheus-histogram-test/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.559Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/harbor2/harbor-servicemonitor/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.559Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/kube-apiserver/0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.560Z caller=kubernetes.go:326 level=info component="discovery manager notify" discovery=kubernetes config=config-0 msg="Using pod service account via in-cluster config"
ts=2023-05-03T12:30:31.722Z caller=main.go:1246 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=187.276911ms db_storage=2.1µs remote_storage=2.47µs web_handler=510ns query_engine=1.47µs scrape=124.532µs scrape_sd=3.387231ms notify=15.64µs notify_sd=317.315µs rules=162.399927ms tracing=6.62µs
ts=2023-05-03T13:00:06.323Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1683108000000 maxt=1683115200000 ulid=01GZGV5FA8TSF066VPPWS7S718 duration=6.123304932s
ts=2023-05-03T13:00:06.756Z caller=head.go:1269 level=info component=tsdb msg="Head GC completed" caller=truncateMemory duration=427.072986ms
ts=2023-05-03T13:00:06.763Z caller=checkpoint.go:100 level=info component=tsdb msg="Creating checkpoint" from_segment=67544 to_segment=67547 mint=1683115200000
ts=2023-05-03T13:00:11.179Z caller=head.go:1241 level=info component=tsdb msg="WAL checkpoint complete" first=67544 last=67547 duration=4.416399883s

Anything else we need to know?:

tyagian commented 1 year ago

@frederiksf I also observed same bug. Do we need both histogram and float type queries for this? Any workaround to disable histogram type rules?

dgarcdu commented 12 months ago

@frederiksf I also observed same bug. Do we need both histogram and float type queries for this? Any workaround to disable histogram type rules?

A quick solution would be dropping the metrics with the type="histogram" pair via metric_relabel_config