risingwavelabs / risingwave

SQL stream processing, analytics, and management. We decouple storage and compute to offer efficient joins, instant failover, dynamic scaling, speedy bootstrapping, and concurrent query serving.
https://www.risingwave.com/slack
Apache License 2.0
6.6k stars 541 forks source link

state_store_write_batch_tuple_counts metrics missed for 40s #8951

Closed cyliu0 closed 4 months ago

cyliu0 commented 1 year ago

Describe the bug

This is the first time we hit this issue in the daily performance test. Maybe it's not easy to reproduce this issue. I filed this issue to record the bug in case we hit this again in the future.

The data from Prometheus shows that there are 40 seconds without this metrics data. The scrape interval is 5 seconds. This metrics is empty between [1680452011.803,\"8161\"],[1680452051.804,\"8161\"]

[1680452001.803,\"8161\"],[1680452006.803,\"8161\"],[1680452011.803,\"8161\"],[1680452051.804,\"8161\"],[1680452056.804,\"8161\"]

The promql is state_store_write_batch_tuple_counts{namespace="nexmark-bs-0-1-2-3-4-daily-20230402"}[24h].

To Reproduce

Run nexmark q0 and q1.

The buildkite pipeline job

Expected behavior

The metrics data interval should be 5 seconds.

Additional context

image

lmatz commented 1 year ago

Did it ever happen again?

cyliu0 commented 4 months ago

Closed since never happen again