M3-Limit-Max-Series and M3-Limit-Require-Exhaustive not behaving as documented

Description

When M3-Limit-Max-Series header is set, some queries are returning M3-Results-Limited: max_fetch_series_limit_applied when they shouldn't have hit the series limit. While testing this I discovered that setting M3-Limit-Require-Exhaustive to true does not return an error when going over the series limit.

Examples

M3-Limit-Max-Series

The query count(container_start_time_seconds) should touch 769 time series.

$ curl -v 'http://localhost:7201/api/v1/query?query=count(container_start_time_seconds)' -H 'M3-Limit-Max-Series: 1000' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 7201 (#0)
> GET /api/v1/query?query=count(container_start_time_seconds) HTTP/1.1
> Host: localhost:7201
> User-Agent: curl/7.64.1
> Accept: */*
> M3-Limit-Max-Series: 1000
> 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0< HTTP/1.1 200 OK
< Access-Control-Allow-Headers: accept, content-type, authorization
< Access-Control-Allow-Methods: POST, GET, OPTIONS, PUT, DELETE
< Access-Control-Allow-Origin: *
< Content-Type: application/json
< M3-Engine: prometheus
< M3-Results-Limited: max_fetch_series_limit_applied
< Date: Wed, 03 Feb 2021 19:48:44 GMT
< Content-Length: 172
< 
{ [172 bytes data]
100   172  100   172    0     0    437      0 --:--:-- --:--:-- --:--:--   436
* Connection #0 to host localhost left intact
* Closing connection 0
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1612381724.309,
          "534"
        ]
      }
    ]
  },
  "warnings": [
    "m3db exceeded query limit: results not exhaustive"
  ]
}

M3-Limit-Require-Exhaustive

$ curl -v 'http://localhost:7201/api/v1/query?query=count(container_start_time_seconds)' -H 'M3-Limit-Max-Series: 1' -H "M3-Limit-Require-Exhaustive: true" | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 7201 (#0)
> GET /api/v1/query?query=count(container_start_time_seconds) HTTP/1.1
> Host: localhost:7201
> User-Agent: curl/7.64.1
> Accept: */*
> M3-Limit-Max-Series: 1
> M3-Limit-Require-Exhaustive: true
> 
< HTTP/1.1 200 OK
< Access-Control-Allow-Headers: accept, content-type, authorization
< Access-Control-Allow-Methods: POST, GET, OPTIONS, PUT, DELETE
< Access-Control-Allow-Origin: *
< Content-Type: application/json
< M3-Engine: prometheus
< M3-Results-Limited: max_fetch_series_limit_applied
< Date: Wed, 03 Feb 2021 19:50:57 GMT
< Content-Length: 170
< 
{ [170 bytes data]
100   170  100   170    0     0    693      0 --:--:-- --:--:-- --:--:--   693
* Connection #0 to host localhost left intact
* Closing connection 0
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1612381857.409,
          "1"
        ]
      }
    ]
  },
  "warnings": [
    "m3db exceeded query limit: results not exhaustive"
  ]
}

Config

M3 Query

Version: coordinator_build_information{branch="HEAD",build_date="2020_11_17_23_23_36",build_version="v1_0_0",go_version="go1_13_15",revision="a3853ee56"} Config:

listenAddress: "0.0.0.0:7201"
#metrics configuration
metrics:
  scope:
    prefix: "coordinator"
  prometheus:
    handlerPath: /metrics
    listenAddress: 0.0.0.0:7203 # until https://github.com/m3db/m3/issues/682 is resolved
  sanitization: prometheus
  samplingRate: 0.01
  extended: none
rpc:
  enabled: true
  listenAddress: "0.0.0.0:7202" 
clusters:
  - namespaces:
      - namespace: 21d
        retention: 504h
        type: unaggregated

    client:
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /tmp/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:
                - 127.0.0.45:2379
                - 127.0.0.46:2379
                - 127.0.0.47:2379
      writeConsistencyLevel: majority
      readConsistencyLevel: unstrict_majority
      writeTimeout: 30s
      # fetchTimeout defines the fetch timeout for any given query.
      # The default is 30s and the max is 5m.
      fetchTimeout: 30s
      connectTimeout: 20s
      writeRetry:
        initialBackoff: 500ms
        backoffFactor: 3
        maxRetries: 2
        jitter: true
      fetchRetry:
        initialBackoff: 500ms
        backoffFactor: 2
        maxRetries: 3
        jitter: true
      backgroundHealthCheckFailLimit: 4
      backgroundHealthCheckFailThrottleFactor: 0.5
readWorkerPoolPolicy:
  grow: true
  size: 2
writeWorkerPoolPolicy:
  grow: true
  size: 2
tracing:
  backend: jaeger
  jaeger:
    serviceName: m3-query-21d-wus
    reporter:
      collectorEndpoint: http://m3-tracing.m3-query.svc:14268/api/traces
    tags:
      - key: cluster.profile
        value: prod
      - key: cluster.site
        value: wus
tagOptions:
  # See here for more information: http://m3db.github.io/m3/how_to/query/#id-generation
  idScheme: quoted
limits:
  perQuery:
    maxFetchedSeries: 50001

M3DB

Version: build_information{branch="HEAD",build_date="2020_10_28_19_57_32",build_version="v0_15_17_hotfix_2",go_version="go1_13_8",revision="313c1e482"} Config:

coordinator:
  listenAddress:
    type: "config"
    value: "0.0.0.0:7201"

  metrics:
    scope:
      prefix: "coordinator"
    prometheus:
      handlerPath: /metrics
      listenAddress: 0.0.0.0:7203 # until https://github.com/m3db/m3/issues/682 is resolved
    sanitization: prometheus
    samplingRate: 0.01
    extended: none

  limits:
    maxComputedDatapoints: 10000

  tagOptions:
    idScheme: quoted

db:
  logging:
    level: info

  metrics:
    prometheus:
      handlerPath: /metrics
    sanitization: prometheus
    samplingRate: 0.01
    extended: detailed

  limits:
    # If set, will enforce a maximum cap on time series blocks matched for
    # queries searching time series by dimensions.
    maxRecentlyQueriedSeriesBlocks:
      # Value sets the maximum time series blocks matched, use your block
      # settings to understand how many datapoints that may actually translate
      # to (e.g. 2 hour blocks for unaggregated data with 30s scrape interval
      # will translate to 240 datapoints per single time series block matched).
      value: 900000
      # Lookback sets the time window that this limit is enforced over, every
      # lookback period the global count is reset to zero and when the limit
      # is reached it will reject any further time series blocks being matched
      # and read until the lookback period resets.
      lookback: 30s
    # MaxRecentlyQueriedSeriesDiskBytesRead sets the upper limit on time series
    # bytes read from disk within a given lookback period. Queries which are
    # issued while this max is surpassed encounter an error.
    maxRecentlyQueriedSeriesDiskBytesRead:
      value: 300000000
      lookback: 30s
    # If set then will limit the number of parallel write batch requests to the
    # database and return errors if hit.
    maxOutstandingWriteRequests: 0
    # If set then will limit the number of parallel read requests to the
    # database and return errors if hit.
    # Note since reads can be so variable in terms of how expensive they are
    # it is not always very useful to use this config to prevent resource
    # exhaustion from reads.
    maxOutstandingReadRequests: 0

  hostID:
    resolver: environment
    envVarName: M3DB_HOST_ID

  listenAddress: 0.0.0.0:9000
  clusterListenAddress: 0.0.0.0:9001
  httpNodeListenAddress: 0.0.0.0:9002
  httpClusterListenAddress: 0.0.0.0:9003
  debugListenAddress: 0.0.0.0:9004

  client:
    writeConsistencyLevel: majority
    readConsistencyLevel: unstrict_majority

  gcPercentage: 100

  writeNewSeriesAsync: true
  writeNewSeriesLimitPerSecond: 1048576
  writeNewSeriesBackoffDuration: 2ms

  bootstrap:
    bootstrappers:
        - filesystem
        - commitlog
        - peers
        - uninitialized_topology
    commitlog:
        returnUnfulfilledForCorruptCommitLogFiles: false

  cache:
    series:
      policy: lru

  commitlog:
    flushMaxBytes: 524288
    flushEvery: 1s
    queue:
        calculationType: fixed
        size: 8388608
    blockSize: 10m

  fs:
    filePathPrefix: /data/m3db

  config:
    service:
      env: sts2_env
      zone: embedded
      service: m3db
      cacheDir: /data/m3kv
      etcdClusters:
        - zone: embedded
          endpoints:
            - 127.0.0.45:2379
            - 127.0.0.47:2379
            - 127.0.0.46:2379

Additional Info

I ran sum by (result, exhaustive) (rate(dbindex_query[1m])) and all the *_require_exhaustive results were 0.

I also tried replicating the issue with the prometheus docker integration tests in the M3 repo. I modified the test suite to scrape an avalanche instance that was producing a single metric with 10,000 series. Unfortunately I wasn't able to reproduce the issue we're seeing in our production cluster.

m3db / m3

M3-Limit-Max-Series and M3-Limit-Require-Exhaustive not behaving as documented #3161