grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.14k stars 532 forks source link

Query Frontend nil pointer dereference on history import #2249

Open daxroc opened 2 years ago

daxroc commented 2 years ago

Describe the bug

After following the migration guide for importing historic blocks from a thanos s3 bucket and running the metaconvert. We can not perform queries on the historical data. With the following error reported by the query frontends (repeated 5 times for retries).

level=error ts=2022-06-28T06:07:54.581487739Z caller=retry.go:78 user=demo msg="error processing request" try=4 err="unexpected error: runtime error: invalid memory address or nil pointer dereference"

We can view the metric names and labels within grafana for the same period but all queries fail for metric data

Note These Thanos deployment are configured with HA Pairs identified by prometheus_replica labels, The blocks in question were all compacted and belonging to only one of the two replicas.

retention resolutions are not standard either. retentionResolutionRaw: 30d retentionResolution5m: 60d retentionResolution1h: 365d

To Reproduce

Setup thanos with HA replication Import one prometheus replica blocks to the demo tenant Run the metaconvert tool

Run a query for the imported range

Version: grafana/mimir:2.1.0

Expected behavior

Queries return metrics for requested time range

Environment

Additional Context

Sample of meta.json imported (truncated)

{
    "compaction": {
        "level": 4,
        "parents": [
            {
                "maxTime": 1653696000000,
                "minTime": 1653523200000,
                "ulid": "01G44C3CCSGCBBK90RM6EYVZ70"
            },
            ...
            {
                "maxTime": 1654732800000,
                "minTime": 1654725600000,
                "ulid": "01G52ZKBMEQRWAN2KN0ETXECF7"
            }
        ],
        "sources": [
            "01G3Z4X0WDMHSY2RVD01VM0N4J",
            ...
        ]
    },
    "maxTime": 1654732800000,
    "minTime": 1653523200000,
    "stats": {
        "numChunks": 677561664,
        "numSamples": 76490486169,
        "numSeries": 70434827
    },
    "thanos": {
        "downsample": {
            "resolution": 0
        },
        "files": [
            {
                "rel_path": "chunks/000001",
                "size_bytes": 536870878
            },
            ...
            {
                "rel_path": "index",
                "size_bytes": 14354186206
            },
            {
                "rel_path": "meta.json"
            }
        ],
        "labels": {
            "prometheus_group": "demo",
            "prometheus_replica": "demo-prometheus-server-1"
        },
        "segment_files": [
           "000001",
           ...
           "000294"
        ],
        "source": "compactor"
    },
    "ulid": "01G53B05S7KE38K73JKK815CNH",
    "version": 1
}
pstibrany commented 2 years ago

Thanks for reporting the issue. I'd like to focus on runtime error: invalid memory address or nil pointer dereference first. That is a message from some component panicking, most likely querier or perhaps store-gateway. Could you please try to check the logs for this message and see if there is a full stacktrace available in the logs? It would help to locale the location in the code that's failing.

PR https://github.com/grafana/mimir/pull/2122/files#diff-b20108416dc866c96c8f9439351c9f6613610a52bfcfd9a567cdb7e0c0274e66 clarifies how Thanos-supported features are handled in Mimir (it reflects changes in Mimir 2.2). But panic is still unexpected.

daxroc commented 2 years ago
level=error ts=2022-06-28T09:40:23.850063046Z caller=engine.go:952 msg="runtime panic in parser" err="runtime error: invalid memory address or nil pointer dereference" stacktrace="goroutine 56094193 [running]:\ngithub.com/prometheus/prometheus/promql.(*evaluator).recover(0x4004101680, 0x4003e1c110, 0x4003e1c128)\n\t/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:950 +0x290\npanic({0x16ad3a0, 0x2e5f300})\n\t/usr/local/go/src/runtime/panic.go:1038 +0x224\ngithub.com/grafana/mimir/pkg/querier.(*blockQuerierSeries).Iterator(0x400afaa9f0)\n\t/__w/mimir/mimir/pkg/querier/block.go:120 +0x250\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0x4004101680, {0x1f8c390, 0x40166a2080})\n\t/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1549 +0x3d70\ngithub.com/prometheus/prometheus/promql.(*evaluator).Eval(0x4004101680, {0x1f8c390, 0x40166a2080})\n\t/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:965 +0x90\ngithub.com/prometheus/prometheus/promql.(*Engine).execEvalStmt(0x4000ccf400, {0x1f8b6a8, 0x400b66de90}, 0x40006120e0, 0x4004f4d9a0)\n\t/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:696 +0xb34\ngithub.com/prometheus/prometheus/promql.(*Engine).exec(0x4000ccf400, {0x1f8b6a8, 0x400b66de90}, 0x40006120e0)\n\t/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:595 +0x594\ngithub.com/prometheus/prometheus/promql.(*query).Exec(0x40006120e0, {0x1f8b6a8, 0x400b66dd70})\n\t/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:197 +0x190\ngithub.com/prometheus/prometheus/web/api/v1.(*API).queryRange(0x40003d8b40, 0x4002e60700)\n\t/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/web/api/v1/api.go:494 +0xc78\ngithub.com/prometheus/prometheus/web/api/v1.(*API).Register.func2.1(0x4002e60700)\n\t/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/web/api/v1/api.go:313 +0x134\ngithub.com/prometheus/prometheus/web/api/v1.(*API).Register.func1.1({0x1f6b630, 0x4051c2c800}, 0x4002e60700)\n\t/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/web/api/v1/api.go:288 +0x74\nnet/http.HandlerFunc.ServeHTTP(0x4000196708, {0x1f6b630, 0x4051c2c800}, 0x4002e60700)\n\t/usr/local/go/src/net/http/server.go:2047 +0x40\ngithub.com/prometheus/prometheus/util/httputil.CompressionHandler.ServeHTTP({{0x1f53ce0, 0x4000196708}}, {0x1f6fda0, 0x400b66dbc0}, 0x4002e60700)\n\t/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/util/httputil/compression.go:90 +0x6c\ngithub.com/prometheus/common/route.(*Router).handle.func1({0x1f6fda0, 0x400b66dbc0}, 0x4002e60600, {0x0, 0x0, 0x0})\n\t/__w/mimir/mimir/vendor/github.com/prometheus/common/route/route.go:83 +0x2b8\ngithub.com/julienschmidt/httprouter.(*Router).ServeHTTP(0x4000400f00, {0x1f6fda0, 0x400b66dbc0}, 0x4002e60600)\n\t/__w/mimir/mimir/vendor/github.com/julienschmidt/httprouter/router.go:387 +0x960\ngithub.com/prometheus/common/route.(*Router).ServeHTTP(0x4000793c40, {0x1f6fda0, 0x400b66dbc0}, 0x4002e60600)\n\t/__w/mimir/mimir/vendor/github.com/prometheus/common/route/route.go:126 +0x44\ngithub.com/weaveworks/common/middleware.Instrument.Wrap.func1.2({0x1f6fda0, 0x400b66dbc0})\n\t/__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/instrument.go:70 +0x48\ngithub.com/felixge/httpsnoop.CaptureMetricsFn({0x1f6fda0, 0x400b66d950}, 0x400123d1f8)\n\t/__w/mimir/mimir/vendor/github.com/felixge/httpsnoop/capture_metrics.go:76 +0x1f0\ngithub.com/weaveworks/common/middleware.Instrument.Wrap.func1({0x1f6fda0, 0x400b66d950}, 0x4002e60600)\n\t/__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/instrument.go:69 +0x2a4\nnet/http.HandlerFunc.ServeHTTP(0x4004f4d8b0, {0x1f6fda0, 0x400b66d950}, 0x4002e60600)\n\t/usr/local/go/src/net/http/server.go:2047 +0x40\ngithub.com/gorilla/mux.(*Router).ServeHTTP(0x4000bcd5c0, {0x1f6fda0, 0x400b66d950}, 0x4002e60400)\n\t/__w/mimir/mimir/vendor/github.com/gorilla/mux/mux.go:210 +0x1ec\ngithub.com/grafana/mimir/pkg/querier/stats.WallTimeMiddleware.Wrap.func1({0x1f6fda0, 0x400b66d950}, 0x4002e60400)\n\t/__w/mimir/mimir/pkg/querier/stats/time_middleware.go:30 +0xb0\nnet/http.HandlerFunc.ServeHTTP(0x4000196f78, {0x1f6fda0, 0x400b66d950}, 0x4002e60400)\n\t/usr/local/go/src/net/http/server.go:2047 +0x40\ngithub.com/weaveworks/common/middleware.glob..func1.1({0x1f6fda0, 0x400b66d950}, 0x4002e60300)\n\t/__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/http_auth.go:17 +0x278\nnet/http.HandlerFunc.ServeHTTP(0x4000196ff0, {0x1f6fda0, 0x400b66d950}, 0x4002e60300)\n\t/usr/local/go/src/net/http/server.go:2047 +0x40\ngithub.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1({0x1f6fda0, 0x400b66d950}, 0x4002e60300)\n\t/__w/mimir/mimir/vendor/github.com/NYTimes/gziphandler/gzip.go:342 +0x2d8\nnet/http.HandlerFunc.ServeHTTP(0x40007aa3f0, {0x1f6fda0, 0x400b66d950}, 0x4002e60300)\n\t/usr/local/go/src/net/http/server.go:2047 +0x40\ngithub.com/gorilla/mux.(*Router).ServeHTTP(0x40001d9a40, {0x1f6fda0, 0x400b66d950}, 0x4002e60100)\n\t/__w/mimir/mimir/vendor/github.com/gorilla/mux/mux.go:210 +0x1ec\ngithub.com/weaveworks/common/middleware.Instrument.Wrap.func1.2({0x1f6fda0, 0x400b66d950})\n\t/__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/instrument.go:70 +0x48\ngithub.com/felixge/httpsnoop.CaptureMetricsFn({0x1f6c320, 0x4004f4d810}, 0x400123d948)\n\t/__w/mimir/mimir/vendor/github.com/felixge/httpsnoop/capture_metrics.go:76 +0x1f0\ngithub.com/weaveworks/common/middleware.Instrument.Wrap.func1({0x1f6c320, 0x4004f4d810}, 0x4002e60100)\n\t/__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/instrument.go:69 +0x2a4\nnet/http.HandlerFunc.ServeHTTP(0x4000181ae0, {0x1f6c320, 0x4004f4d810}, 0x4002e60100)\n\t/usr/local/go/src/net/http/server.go:2047 +0x40\ngithub.com/weaveworks/common/middleware.Log.Wrap.func1({0x1f70070, 0x4051c2c7a0}, 0x4002e60100)\n\t/__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/logging.go:53 +0x260\nnet/http.HandlerFunc.ServeHTTP(0x4000674f40, {0x1f70070, 0x4051c2c7a0}, 0x4002e60100)\n\t/usr/local/go/src/net/http/server.go:2047 +0x40\ngithub.com/opentracing-contrib/go-stdlib/nethttp.MiddlewareFunc.func5({0x1f6e4b0, 0x4018664600}, 0x4002e60000)\n\t/__w/mimir/mimir/vendor/github.com/opentracing-contrib/go-stdlib/nethttp/server.go:154 +0x658\nnet/http.HandlerFunc.ServeHTTP(0x4000674f80, {0x1f6e4b0, 0x4018664600}, 0x4002e60000)\n\t/usr/local/go/src/net/http/server.go:2047 +0x40\ngithub.com/weaveworks/common/httpgrpc/server.Server.Handle({{0x1f53ce0, 0x4000674f80}}, {0x1f8b6a8, 0x400b66d6e0}, 0x4004f4d6d0)\n\t/__w/mimir/mimir/vendor/github.com/weaveworks/common/httpgrpc/server/server.go:61 +0x40c\ngithub.com/grafana/mimir/pkg/querier/worker.(*frontendProcessor).runRequest(0x40009b5740, {0x1f8b600, 0x4000bb0ac0}, 0x4004f4d6d0, 0x1, 0x4051c2c740)\n\t/__w/mimir/mimir/pkg/querier/worker/frontend_processor.go:145 +0xd8\ncreated by github.com/grafana/mimir/pkg/querier/worker.(*frontendProcessor).process\n\t/__w/mimir/mimir/pkg/querier/worker/frontend_processor.go:118 +0x178\n"

That's from one of the queriers, I missed these earlier

pracucci commented 2 years ago

That's from one of the queriers, I missed these earlier

Re-formatting the stack trace for better readability:


github.com/prometheus/prometheus/promql.(*evaluator).recover(0x4004101680, 0x4003e1c110, 0x4003e1c128)
    /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:950 +0x290
panic({0x16ad3a0, 0x2e5f300})
    /usr/local/go/src/runtime/panic.go:1038 +0x224
github.com/grafana/mimir/pkg/querier.(*blockQuerierSeries).Iterator(0x400afaa9f0)
    /__w/mimir/mimir/pkg/querier/block.go:120 +0x250
github.com/prometheus/prometheus/promql.(*evaluator).eval(0x4004101680, {0x1f8c390, 0x40166a2080})
    /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1549 +0x3d70
github.com/prometheus/prometheus/promql.(*evaluator).Eval(0x4004101680, {0x1f8c390, 0x40166a2080})
    /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:965 +0x90
github.com/prometheus/prometheus/promql.(*Engine).execEvalStmt(0x4000ccf400, {0x1f8b6a8, 0x400b66de90}, 0x40006120e0, 0x4004f4d9a0)
    /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:696 +0xb34
github.com/prometheus/prometheus/promql.(*Engine).exec(0x4000ccf400, {0x1f8b6a8, 0x400b66de90}, 0x40006120e0)
    /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:595 +0x594
github.com/prometheus/prometheus/promql.(*query).Exec(0x40006120e0, {0x1f8b6a8, 0x400b66dd70})
    /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:197 +0x190
github.com/prometheus/prometheus/web/api/v1.(*API).queryRange(0x40003d8b40, 0x4002e60700)
    /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/web/api/v1/api.go:494 +0xc78
github.com/prometheus/prometheus/web/api/v1.(*API).Register.func2.1(0x4002e60700)
    /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/web/api/v1/api.go:313 +0x134
github.com/prometheus/prometheus/web/api/v1.(*API).Register.func1.1({0x1f6b630, 0x4051c2c800}, 0x4002e60700)
    /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/web/api/v1/api.go:288 +0x74
net/http.HandlerFunc.ServeHTTP(0x4000196708, {0x1f6b630, 0x4051c2c800}, 0x4002e60700)
    /usr/local/go/src/net/http/server.go:2047 +0x40
github.com/prometheus/prometheus/util/httputil.CompressionHandler.ServeHTTP({{0x1f53ce0, 0x4000196708}}, {0x1f6fda0, 0x400b66dbc0}, 0x4002e60700)
    /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/util/httputil/compression.go:90 +0x6c
github.com/prometheus/common/route.(*Router).handle.func1({0x1f6fda0, 0x400b66dbc0}, 0x4002e60600, {0x0, 0x0, 0x0})
    /__w/mimir/mimir/vendor/github.com/prometheus/common/route/route.go:83 +0x2b8
github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0x4000400f00, {0x1f6fda0, 0x400b66dbc0}, 0x4002e60600)
    /__w/mimir/mimir/vendor/github.com/julienschmidt/httprouter/router.go:387 +0x960
github.com/prometheus/common/route.(*Router).ServeHTTP(0x4000793c40, {0x1f6fda0, 0x400b66dbc0}, 0x4002e60600)
    /__w/mimir/mimir/vendor/github.com/prometheus/common/route/route.go:126 +0x44
github.com/weaveworks/common/middleware.Instrument.Wrap.func1.2({0x1f6fda0, 0x400b66dbc0})
    /__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/instrument.go:70 +0x48
github.com/felixge/httpsnoop.CaptureMetricsFn({0x1f6fda0, 0x400b66d950}, 0x400123d1f8)
    /__w/mimir/mimir/vendor/github.com/felixge/httpsnoop/capture_metrics.go:76 +0x1f0
github.com/weaveworks/common/middleware.Instrument.Wrap.func1({0x1f6fda0, 0x400b66d950}, 0x4002e60600)
    /__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/instrument.go:69 +0x2a4
net/http.HandlerFunc.ServeHTTP(0x4004f4d8b0, {0x1f6fda0, 0x400b66d950}, 0x4002e60600)
    /usr/local/go/src/net/http/server.go:2047 +0x40
github.com/gorilla/mux.(*Router).ServeHTTP(0x4000bcd5c0, {0x1f6fda0, 0x400b66d950}, 0x4002e60400)
    /__w/mimir/mimir/vendor/github.com/gorilla/mux/mux.go:210 +0x1ec
github.com/grafana/mimir/pkg/querier/stats.WallTimeMiddleware.Wrap.func1({0x1f6fda0, 0x400b66d950}, 0x4002e60400)
    /__w/mimir/mimir/pkg/querier/stats/time_middleware.go:30 +0xb0
net/http.HandlerFunc.ServeHTTP(0x4000196f78, {0x1f6fda0, 0x400b66d950}, 0x4002e60400)
    /usr/local/go/src/net/http/server.go:2047 +0x40
github.com/weaveworks/common/middleware.glob..func1.1({0x1f6fda0, 0x400b66d950}, 0x4002e60300)
    /__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/http_auth.go:17 +0x278
net/http.HandlerFunc.ServeHTTP(0x4000196ff0, {0x1f6fda0, 0x400b66d950}, 0x4002e60300)
    /usr/local/go/src/net/http/server.go:2047 +0x40
github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1({0x1f6fda0, 0x400b66d950}, 0x4002e60300)
    /__w/mimir/mimir/vendor/github.com/NYTimes/gziphandler/gzip.go:342 +0x2d8
net/http.HandlerFunc.ServeHTTP(0x40007aa3f0, {0x1f6fda0, 0x400b66d950}, 0x4002e60300)
    /usr/local/go/src/net/http/server.go:2047 +0x40
github.com/gorilla/mux.(*Router).ServeHTTP(0x40001d9a40, {0x1f6fda0, 0x400b66d950}, 0x4002e60100)
    /__w/mimir/mimir/vendor/github.com/gorilla/mux/mux.go:210 +0x1ec
github.com/weaveworks/common/middleware.Instrument.Wrap.func1.2({0x1f6fda0, 0x400b66d950})
    /__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/instrument.go:70 +0x48
github.com/felixge/httpsnoop.CaptureMetricsFn({0x1f6c320, 0x4004f4d810}, 0x400123d948)
    /__w/mimir/mimir/vendor/github.com/felixge/httpsnoop/capture_metrics.go:76 +0x1f0
github.com/weaveworks/common/middleware.Instrument.Wrap.func1({0x1f6c320, 0x4004f4d810}, 0x4002e60100)
    /__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/instrument.go:69 +0x2a4
net/http.HandlerFunc.ServeHTTP(0x4000181ae0, {0x1f6c320, 0x4004f4d810}, 0x4002e60100)
    /usr/local/go/src/net/http/server.go:2047 +0x40
github.com/weaveworks/common/middleware.Log.Wrap.func1({0x1f70070, 0x4051c2c7a0}, 0x4002e60100)
    /__w/mimir/mimir/vendor/github.com/weaveworks/common/middleware/logging.go:53 +0x260
net/http.HandlerFunc.ServeHTTP(0x4000674f40, {0x1f70070, 0x4051c2c7a0}, 0x4002e60100)
    /usr/local/go/src/net/http/server.go:2047 +0x40
github.com/opentracing-contrib/go-stdlib/nethttp.MiddlewareFunc.func5({0x1f6e4b0, 0x4018664600}, 0x4002e60000)
    /__w/mimir/mimir/vendor/github.com/opentracing-contrib/go-stdlib/nethttp/server.go:154 +0x658
net/http.HandlerFunc.ServeHTTP(0x4000674f80, {0x1f6e4b0, 0x4018664600}, 0x4002e60000)
    /usr/local/go/src/net/http/server.go:2047 +0x40
github.com/weaveworks/common/httpgrpc/server.Server.Handle({{0x1f53ce0, 0x4000674f80}}, {0x1f8b6a8, 0x400b66d6e0}, 0x4004f4d6d0)
    /__w/mimir/mimir/vendor/github.com/weaveworks/common/httpgrpc/server/server.go:61 +0x40c
github.com/grafana/mimir/pkg/querier/worker.(*frontendProcessor).runRequest(0x40009b5740, {0x1f8b600, 0x4000bb0ac0}, 0x4004f4d6d0, 0x1, 0x4051c2c740)
    /__w/mimir/mimir/pkg/querier/worker/frontend_processor.go:145 +0xd8
created by github.com/grafana/mimir/pkg/querier/worker.(*frontendProcessor).process
    /__w/mimir/mimir/pkg/querier/worker/frontend_processor.go:118 +0x178
pracucci commented 2 years ago
github.com/grafana/mimir/pkg/querier.(*blockQuerierSeries).Iterator(0x400afaa9f0)
    /__w/mimir/mimir/pkg/querier/block.go:120 +0x250

At that line we have: https://github.com/grafana/mimir/blob/7e61f6debb315e38613cd28c2d3b38cabdf162b7/pkg/querier/block.go#L120

Is c.Raw nil?