m3db / m3

M3 monorepo - Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform
https://m3db.io/
Apache License 2.0
4.74k stars 452 forks source link

rate function sudden peak #2493

Open AmirAliSobhGol opened 4 years ago

AmirAliSobhGol commented 4 years ago

I have two prometheus HA pairs created by prometheus operator that are writing to m3coordinator. I have the following query that is returning an invalid result from m3coordinator which can be seen in the last image.

rate(hadoop_datanode_byteswritten{instance=~"86.104.47.48:39995"}[1m])

m3db version: v0.15.6

prometheus-1 config:

global:
  scrape_interval: 30s
  scrape_timeout: 10s
  evaluation_interval: 30s
  external_labels:
    prometheus: prometheus-service/infra-kandoo
    prometheus_replica: prometheus-infra-kandoo-1
remote_write:
- url: http://m3db.kandoo.roo.cloud:7201/api/v1/prom/remote/write
  remote_timeout: 30s
  write_relabel_configs:
  - separator: ;
    regex: prometheus_replica
    replacement: $1
    action: labeldrop
  queue_config:
    capacity: 500
    max_shards: 1000
    min_shards: 1
    max_samples_per_send: 100
    batch_send_deadline: 5s
    min_backoff: 30ms
    max_backoff: 100ms

image

prometheus-0 config:

global:
  scrape_interval: 30s
  scrape_timeout: 10s
  evaluation_interval: 30s
  external_labels:
    prometheus: prometheus-service/infra-kandoo
    prometheus_replica: prometheus-infra-kandoo-0
remote_write:
- url: http://m3db.kandoo.roo.cloud:7201/api/v1/prom/remote/write
  remote_timeout: 30s
  write_relabel_configs:
  - separator: ;
    regex: prometheus_replica
    replacement: $1
    action: labeldrop
  queue_config:
    capacity: 500
    max_shards: 1000
    min_shards: 1
    max_samples_per_send: 100
    batch_send_deadline: 5s
    min_backoff: 30ms
    max_backoff: 100ms

image

local prometheus: config:

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 1m
  external_labels:
    prometheus: kandoo-bare-metal
remote_read:
- url: http://m3db.kandoo.roo.cloud:7201/api/v1/prom/remote/read
  remote_timeout: 1m
  read_recent: true

image

grafana that is connected directly to m3coordinator: image

Please let me know if you need any extra information

AmirAliSobhGol commented 4 years ago

local prometheus randomly shows unrealistic peaks as well. (m3coordinator consistently shows the peaks) image

AmirAliSobhGol commented 4 years ago

also the result of:

resets(hadoop_datanode_byteswritten[40h])

returns all 0