nats-io / prometheus-nats-exporter

A Prometheus exporter for NATS metrics
Apache License 2.0
380 stars 139 forks source link

nats_stream_total_messages grows, but no other metric follows it #244

Open hpdobrica opened 1 year ago

hpdobrica commented 1 year ago

Hello, I was tracking nats_stream_total_messages metric for one stream, which usually grows and falls together with nats_consumer_num_pending metric in our system.

Sometimes it happens that nats_stream_total_messages grows a lot (until it fills up the whole stream), but no other metric I could find (except nats_stream_total_bytes) shows similar growth.

I've tried summing all of the metrics I found that seem to show number of messages for a stream:

but the sum of all these for one stream is below 1k, while nats_stream_total_messages is around 100k.

Is there a metric that i'm missing that could explain what these 100k messages are? This spike seems to only be observable in our system through nats_stream_total_messages and nats_stream_total_bytes.

Any ideas what I'm missing and how i can find out how to track these messages?

Thanks a lot for your help!

somratdutta commented 2 months ago

hey @hpdobrica, let me know if this issue still persists, then I will investigate futher. Thanks.

0xAX commented 4 weeks ago

Hello, I am not sure is it related to this issue or not but I also see strange correlation between nats_stream_total_bytes, nats_stream_total_messages and nats_stream_last_seq. I am using default Grafana dashboard (https://github.com/nats-io/prometheus-nats-exporter/blob/main/walkthrough/grafana-jetstream-dash-helm.json, the graphics a bit re-arranged but the metrics the same) and here is what I see:

nats-metrics

The load and number of messages sent towards NATS and to the following streams is the same:

And it is visible that messages are constantly goes although 5-6 minutes nats_stream_total_bytes and nats_stream_total_messages for these streams goes to down, although I expect it to be somewhat similar to the beginning of the graphics as the rate of messages (at least based on the nats_stream_last_seq) is the same.

NATS version is - nats:2.10.21-alpine

Is there any ideas what it could be?

Jarema commented 4 weeks ago

We will take a look into this.