M3DB graphite query problem

v-pap commented 4 years ago

I have a problem with the graphite integration in m3db. When I am querying from the graphite endpoint like this http://m3db:7201/api/v1/graphite/render?target=random.metric&from=-10min&format=json I get as expected, datapoints with 5s resolution. When I execute the same query but with from=-20min I get the datapoints with 1m resolution instead of 10s that the original graphite query would produce. Here is my m3dbnode.yml file. `coordinator: listenAddress: value: "0.0.0.0:7201"

local: namespaces:

namespace: default type: unaggregated retention: 48h
namespace: metrics1 type: aggregated retention: 15m resolution: 5s
namespace: metrics2 type: aggregated retention: 24h resolution: 10s
namespace: metrics3 type: aggregated retention: 168h resolution: 1m

logging: level: info

metrics: scope: prefix: "coordinator" prometheus: handlerPath: /metrics listenAddress: 0.0.0.0:7203 # until https://github.com/m3db/m3/issues/682 is resolved sanitization: prometheus samplingRate: 1.0 extended: none

limits: maxComputedDatapoints: 10000

tagOptions:

Configuration setting for generating metric IDs from tags.

idScheme: quoted carbon: ingester: debug: true listenAddress: "0.0.0.0:7204" rules:
- pattern: .* aggregation: type: mean policies:
  - resolution: 5s retention: 15m
  - resolution: 10s retention: 24h
  - resolution: 1m retention: 168h

db: logging: level: info

metrics: prometheus: handlerPath: /metrics sanitization: prometheus samplingRate: 1.0 extended: detailed

listenAddress: 0.0.0.0:9000 clusterListenAddress: 0.0.0.0:9001 httpNodeListenAddress: 0.0.0.0:9002 httpClusterListenAddress: 0.0.0.0:9003 debugListenAddress: 0.0.0.0:9004

hostID: resolver: config value: m3db_local

client: writeConsistencyLevel: majority readConsistencyLevel: unstrict_majority

gcPercentage: 100

writeNewSeriesAsync: true writeNewSeriesLimitPerSecond: 1048576 writeNewSeriesBackoffDuration: 2ms

bootstrap: bootstrappers:

filesystem
commitlog
peers
uninitialized_topology commitlog: returnUnfulfilledForCorruptCommitLogFiles: false

cache: series: policy: lru postingsList: size: 262144

commitlog: flushMaxBytes: 524288 flushEvery: 1s queue: calculationType: fixed size: 2097152

fs: filePathPrefix: /var/lib/m3db

config: service: env: default_env zone: embedded service: m3db cacheDir: /var/lib/m3kv etcdClusters:
- zone: embedded endpoints:
  - 127.0.0.1:2379 seedNodes: initialCluster:
- hostID: m3db_local endpoint: http://127.0.0.1:2380 `

arnikola commented 4 years ago

This is because of your namespace settings here:

- namespace: metrics1
type: aggregated
retention: 15m
resolution: 5s
- namespace: metrics2
type: aggregated
retention: 24h
resolution: 10s

When you're looking within the 15m window, you will hit the cluster in metrics1 to give you the most complete view of the dataset, but as soon as you go over that, you won't have any additional datapoints (i.e. in the 20 min case you're seeing); because of this, we use the data from metric2, which has 10s resolution, to satisfy this query

v-pap commented 4 years ago

Thank you for your response. But in case I wasn't clear, indeed this is the expected behavior as you described. But I get the metrics3 instead of metrics2.

arnikola commented 4 years ago

Ah I see; are you able to verify that metrics are getting written to metrics2?

v-pap commented 4 years ago

According to the debug logs, the metrics are written successfully. When I remove the metrics3, from the available namespaces the metrics2 gets "promoted" and works correctly. I think the problem is only when you have more than 2 aggregated metrics in the same policies.

arnikola commented 4 years ago

I'll look into it; the namespace selection logic is really confusing so wouldn't be surprised if we got something wrong there

v-pap commented 4 years ago

Do we have any update about this issue?

m3db / m3

M3DB graphite query problem #2065

Configuration setting for generating metric IDs from tags.