m3db / m3

M3 monorepo - Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform
https://m3db.io/
Apache License 2.0
4.76k stars 453 forks source link

M3DB graphite query problem #2065

Open v-pap opened 4 years ago

v-pap commented 4 years ago

I have a problem with the graphite integration in m3db. When I am querying from the graphite endpoint like this http://m3db:7201/api/v1/graphite/render?target=random.metric&from=-10min&format=json I get as expected, datapoints with 5s resolution. When I execute the same query but with from=-20min I get the datapoints with 1m resolution instead of 10s that the original graphite query would produce. Here is my m3dbnode.yml file. `coordinator: listenAddress: value: "0.0.0.0:7201"

local: namespaces:

db: logging: level: info

metrics: prometheus: handlerPath: /metrics sanitization: prometheus samplingRate: 1.0 extended: detailed

listenAddress: 0.0.0.0:9000 clusterListenAddress: 0.0.0.0:9001 httpNodeListenAddress: 0.0.0.0:9002 httpClusterListenAddress: 0.0.0.0:9003 debugListenAddress: 0.0.0.0:9004

hostID: resolver: config value: m3db_local

client: writeConsistencyLevel: majority readConsistencyLevel: unstrict_majority

gcPercentage: 100

writeNewSeriesAsync: true writeNewSeriesLimitPerSecond: 1048576 writeNewSeriesBackoffDuration: 2ms

bootstrap: bootstrappers:

arnikola commented 4 years ago

This is because of your namespace settings here:

- namespace: metrics1
type: aggregated
retention: 15m
resolution: 5s
- namespace: metrics2
type: aggregated
retention: 24h
resolution: 10s

When you're looking within the 15m window, you will hit the cluster in metrics1 to give you the most complete view of the dataset, but as soon as you go over that, you won't have any additional datapoints (i.e. in the 20 min case you're seeing); because of this, we use the data from metric2, which has 10s resolution, to satisfy this query

v-pap commented 4 years ago

Thank you for your response. But in case I wasn't clear, indeed this is the expected behavior as you described. But I get the metrics3 instead of metrics2.

arnikola commented 4 years ago

Ah I see; are you able to verify that metrics are getting written to metrics2?

v-pap commented 4 years ago

According to the debug logs, the metrics are written successfully. When I remove the metrics3, from the available namespaces the metrics2 gets "promoted" and works correctly. I think the problem is only when you have more than 2 aggregated metrics in the same policies.

arnikola commented 4 years ago

I'll look into it; the namespace selection logic is really confusing so wouldn't be surprised if we got something wrong there

v-pap commented 4 years ago

Do we have any update about this issue?