m3db / m3

M3 monorepo - Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform
https://m3db.io/
Apache License 2.0
4.75k stars 453 forks source link

write metrics using http json API only once,But m3coordinator/m3query can look up five minutes of metrics。 #3650

Open yuandongjian opened 3 years ago

yuandongjian commented 3 years ago

It looks like an extra 5 minutes metrics are written automatically。Expect to see only one metric at write time.

curl -X POST http://localhost:7201/api/v1/json/write -d '{
  "tags":
    {
      "__name__": "third_avenue",
      "city": "new_york",
      "checkout": "1"
    },
    "timestamp": '\"$(date "+%s")\"',
    "value": 3347.26
}'
curl -X "POST" -G "http://localhost:7201/api/v1/query_range"  \
 -d "query=third_avenue"   \
 -d "start=$(date "+%s" -d "45 seconds ago")"     \
 -d "end=$( date +%s )"   \
 -d "step=1s" | jq .
{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "__name__": "third_avenuev2",
          "checkout": "1",
          "city": "new_york"
        },
        "values": [
          [
            1628247163,
            "3347.26"
          ],
          [
            1628247164,
            "3347.26"
          ],
          [
            1628247165,
            "3347.26"
          ],
          [
            1628247166,
            "3347.26"
          ],
          [
            1628247167,
            "3347.26"
          ],
          [
            1628247168,
            "3347.26"
          ],
          [
            1628247169,
            "3347.26"
          ],
          [
            1628247170,
            "3347.26"
          ],
          [
            1628247171,
            "3347.26"
          ],
          [
            1628247172,
            "3347.26"
          ],
          [
            1628247173,
            "3347.26"
          ],
          [
            1628247174,
            "3347.26"
          ],
          [
            1628247175,
            "3347.26"
          ],
          [
            1628247176,
            "3347.26"
          ],
          [
            1628247177,
            "3347.26"
          ],
          [
            1628247178,
            "3347.26"
          ],
          [
            1628247179,
            "3347.26"
          ],
          [
            1628247180,
            "3347.26"
          ],
          [
            1628247181,
            "3347.26"
          ]
        ]
      }
    ]
  }
}
yuandongjian commented 3 years ago

here are my config. Is it my wrong configuration by cause? m3db version: 1.1.0

m3coordinator.yml

listenAddress: 0.0.0.0:7201
logging:
  level: info
clusters:
  - namespaces:
      - namespace: default
        retention: 720h
        type: unaggregated
    client:
      config:
        service:
            env: default_env
            zone: embedded
            service: m3db
            etcdClusters:
                - zone: embedded
                  endpoints:
                     - x1:2379
                     - x2:2379
                     - x3:2379
      writeConsistencyLevel: majority
      readConsistencyLevel: unstrict_majority

m3dbnode.yml

db:
  logging:
    level: info

  metrics:
    prometheus:
      handlerPath: /metrics
      listenAddress: 0.0.0.0:7204
    sanitization: prometheus
    samplingRate: 1.0
    extended: detailed

  listenAddress: 0.0.0.0:9000
  clusterListenAddress: 0.0.0.0:9001
  httpNodeListenAddress: 0.0.0.0:9002
  httpClusterListenAddress: 0.0.0.0:9003
  debugListenAddress: 0.0.0.0:9004

  hostID:
    resolver: config
    value: m3db168

  client:
    writeConsistencyLevel: majority
    readConsistencyLevel: unstrict_majority
    writeTimeout: 10s
    fetchTimeout: 15s
    connectTimeout: 20s
    writeRetry:
        initialBackoff: 500ms
        backoffFactor: 3
        maxRetries: 2
        jitter: true
    fetchRetry:
        initialBackoff: 500ms
        backoffFactor: 2
        maxRetries: 3
        jitter: true
    backgroundHealthCheckFailLimit: 4
    backgroundHealthCheckFailThrottleFactor: 0.5

  writeNewSeriesAsync: true
  writeNewSeriesBackoffDuration: 2ms

  filesystem:
    filePathPrefix: /opt/work/m3db

  discovery:
    config:
        service:
            env: default_env
            zone: embedded
            service: m3db
            # etcd集群配置
            cacheDir: /opt/work/m3db/data
            etcdClusters:
                - zone: embedded
                  endpoints:
                     - x1:2379
                     - x2:2379
                     - x3:2379
BertHartm commented 3 years ago

This is intentional behavior in prometheus (and so emulated here), where the data isn't stale for 5 minutes and so it returns the most recent datapoint https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness

yuandongjian commented 3 years ago

@BertHartm Thank you very much,I get it。

mengxifl commented 3 years ago

same issue . we use grafana reade data from M3 . this pic is I use m3db acquired data , as you can see that is a Straight line . In fact it is not counted . image this pic is I use prometheus api read data . This data is what I really want . image is there any parsm to change read data as prometheus api ? @BertHartm ??

yuandongjian commented 3 years ago

@mengxifl 哥们哪个公司的啊,我们也是m3+grafana,技术栈很相似,一起交流交流呀

BertHartm commented 3 years ago

@mengxifl It sounds like for some reason the staleness markers are getting dropped somewhere. It's hard for me to comment on why without knowing the exact setup and versions of your collection flow, but this might give you some insight into understanding what happened https://www.robustperception.io/staleness-and-promql

I believe m3 fully supports the staleness markers without any additional configuration.

llussixn commented 3 years ago

@BertHartm Is there anyway to disable this staleness feature?

BertHartm commented 3 years ago

You can't disable it because that would break querying entirely, but you can adjust the lookback value in the M3 coordinator/m3query configuration (or Prometheus configuration if you're using remote read). You probably don't want to go shorter than something like twice your scrape interval, otherwise you'll go the other way and start missing data points.

BertHartm commented 3 years ago

But, the best thing to do is to investigate what's happening to your staleness markers as that's the better fix to the issue