Closed maulikjs closed 4 years ago
Is there some kind of stack trace printed with the error, perhaps in the Thanos Query logs or somewhere else? Where is this printed in general? Because these don't seem similar to our log format. Do you have a way of reproducing it? :thinking:
It's probably impossible to tell where is a (possible) error without a stack trace.
Hey @GiedriusS , I was running queries against the API where I got these responses, I manually also went to the UI and ran the query for the date and got the same output in a error box. Let me see if I can replicate it and get a screenshot and the pod logs from that.
level=info ts=2020-05-22T19:13:37.505197904Z caller=frontend.go:166 msg="slow query" org_id=fake url="http://telemeter-lts.datahub.redhat.com/api/v1/query?query=round%28avg_over_time%28sum%28sum+by+%28_id%29+%28max+without+%28prometheus%2Creceive%2Cinstance%29+%28+cluster%3Acapacity_cpu_cores%3Asum%29%29+%2B+on+%28_id%29+group_left%28_blah%29+%280+%2A+%28id_version_ebs_account_internal%3Acluster_subscribed%7B%7D+%3D%3D+1%29%29%29%5B1d%3A30m%5D%29%29&time=1587945600.0 " time_taken=6.143809981s
level=debug ts=2020-05-22T19:13:37.505330807Z caller=logging.go:44 traceID=18f9107c5c008580 msg="GET /api/v1/query?query=round%28avg_over_time%28sum%28sum+by+%28_id%29+%28max+without+%28prometheus%2Creceive%2Cinstance%29+%28+cluster%3Acapacity_cpu_cores%3Asum%29%29+%2B+on+%28_id%29+group_left%28_blah%29+%280+%2A+%28id_version_ebs_account_internal%3Acluster_subscribed%7B%7D+%3D%3D+1%29%29%29%5B1d%3A30m%5D%29%29&time=1587945600.0 (422) 6.143992196s"
level=debug ts=2020-05-22T19:13:39.081287152Z caller=logging.go:44 traceID=1ff8e57bf643c559 msg="GET /api/v1/query?query=round%28avg_over_time%28sum%28sum+by+%28_id%29+%28max+without+%28prometheus%2Creceive%2Cinstance%29+%28+cluster%3Acapacity_cpu_cores%3Asum%29%29+%2B+on+%28_id%29+group_left%28_blah%29+%280+%2A+%28id_version_ebs_account_internal%3Acluster_subscribed%7B%7D+%3D%3D+1%29%29%29%5B1d%3A30m%5D%29%29&time=1588118400.0 (422) 4.621643791s"
level=debug ts=2020-05-22T19:13:44.465299862Z caller=logging.go:44 traceID=43ff1398f57ec32c msg="GET /api/v1/query?query=round%28avg_over_time%28sum%28sum+by+%28_id%29+%28max+without+%28prometheus%2Creceive%2Cinstance%29+%28+cluster%3Acapacity_cpu_cores%3Asum%29%29+%2B+on+%28_id%29+group_left%28_blah%29+%280+%2A+%28id_version_ebs_account_internal%3Acluster_subscribed%7B%7D+%3D%3D+1%29%29%29%5B1d%3A30m%5D%29%29&time=1588636800.0 (422) 2.583028696s"
These are the logs from the cortex-query-frontend for those queries.
And this is the log from the associated thanos-query
pod
level=error ts=2020-05-22T19:48:11.719475257Z caller=engine.go:806 msg="runtime panic in parser" err="runtime error: index out of range [-1]" stacktrace="goroutine 2719879 [running]:\ngithub.com/prometheus/prometheus/promql.(*evaluator).recover(0xc0098f2da0, 0xc0098f2b98)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:804 +0xd8\npanic(0x1d695e0, 0xc0096dc360)\n\t/usr/local/go/src/runtime/panic.go:679 +0x1b2\ngithub.com/thanos-io/thanos/pkg/query.(*dedupSeriesSet).peekLset(0xc00584e060, 0x22ce0e0, 0xc0044dc1e0, 0x33b8480)\n\t/go/src/github.com/thanos-io/thanos/pkg/query/iter.go:372 +0x171\ngithub.com/thanos-io/thanos/pkg/query.(*dedupSeriesSet).next(0xc00584e060, 0xc009926400)\n\t/go/src/github.com/thanos-io/thanos/pkg/query/iter.go:388 +0x8a\ngithub.com/thanos-io/thanos/pkg/query.(*dedupSeriesSet).Next(0xc00584e060, 0x22d48e0)\n\t/go/src/github.com/thanos-io/thanos/pkg/query/iter.go:359 +0xaf\ngithub.com/prometheus/prometheus/promql.expandSeriesSet(0x2306760, 0xc0098e8d20, 0x22f1f20, 0xc00584e060, 0x16ef655, 0xc0098e8d20, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:758 +0x5b\ngithub.com/prometheus/prometheus/promql.checkForSeriesSetExpansion(0x2306760, 0xc0098e8d20, 0x22f1a20, 0xc0023056e0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:747 +0x159\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f1588, 0x22f1a20, 0xc0023056e0, 0xc0098e8ed0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:1262 +0x1d7b\ngithub.com/prometheus/prometheus/promql.(*evaluator).rangeEval(0xc0098f1588, 0xc0098ee420, 0xc0098ee538, 0x2, 0x2, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:888 +0x1408\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f1588, 0x22f17e0, 0xc000d68a50, 0xc0098e8e70, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:1029 +0x3396\ngithub.com/prometheus/prometheus/promql.(*evaluator).rangeEval(0xc0098f1588, 0xc0098ef170, 0xc0098ef288, 0x2, 0x2, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:888 +0x1408\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f1588, 0x22f17e0, 0xc000d68a00, 0xc0098e8e10, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:1029 +0x3396\ngithub.com/prometheus/prometheus/promql.(*evaluator).rangeEval(0xc0098f1588, 0xc0098efe60, 0xc0098eff18, 0x2, 0x2, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:888 +0x1408\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f1588, 0x22f1820, 0xc000e64200, 0xc0098e8db0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:1240 +0x26d8\ngithub.com/prometheus/prometheus/promql.(*evaluator).rangeEval(0xc0098f1588, 0xc0098f0c10, 0xc0098f0d28, 0x2, 0x2, 0x0, 0x445ebe, 0xc000ac8480)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:888 +0x1408\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0002f3588, 0x22f17e0, 0xc000d68aa0, 0x15, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:1029 +0x3396\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f2da0, 0x22f19a0, 0xc000acd440, 0xc000ac8580, 0xc000ac8501)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:1323 +0x1cf8\ngithub.com/prometheus/prometheus/promql.(*evaluator).evalSubquery(0xc0002f4da0, 0xc000acd440, 0x1ee2a73)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:1001 +0x59\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f2da0, 0x22f1860, 0xc001d6a820, 0xc0098ea6a0, 0xc004da81c0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:1064 +0x1969\ngithub.com/prometheus/prometheus/promql.(*evaluator).rangeEval(0xc0098f2da0, 0xc0098f2730, 0xc000b24560, 0x1, 0x1, 0x0, 0xc000d68230, 0xc0002f45b8)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:888 +0x1408\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0002f4da0, 0x22f1860, 0xc001d6a860, 0xf, 0xc000d68640)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:1070 +0x18cc\ngithub.com/prometheus/prometheus/promql.(*evaluator).Eval(0xc0002f4da0, 0x22f1860, 0xc001d6a860, 0x0, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:815 +0x99\ngithub.com/prometheus/prometheus/promql.(*Engine).execEvalStmt(0xc0003625a0, 0x2306760, 0xc000acd5c0, 0xc0023057a0, 0xc000d68af0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:531 +0x43d\ngithub.com/prometheus/prometheus/promql.(*Engine).exec(0xc0003625a0, 0x2306760, 0xc000acd5c0, 0xc0023057a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:485 +0x669\ngithub.com/prometheus/prometheus/promql.(*query).Exec(0xc0023057a0, 0x2306760, 0xc000acd350, 0x0)\n\t/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200110114423-1e64d757f711/promql/engine.go:199 +0xb4\ngithub.com/thanos-io/thanos/pkg/query/api.(*API).query(0xc0003632c0, 0xc00126e800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)\n\t/go/src/github.com/thanos-io/thanos/pkg/query/api/v1.go:304 +0x4f1\ngithub.com/thanos-io/thanos/pkg/query/api.(*API).Register.func1.1(0x22fe820, 0xc0003d72d0, 0xc00126e800)\n\t/go/src/github.com/thanos-io/thanos/pkg/query/api/v1.go:145 +0x55\nnet/http.HandlerFunc.ServeHTTP(0xc000b858a0, 0x22fe820, 0xc0003d72d0, 0xc00126e800)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1(0x7fd6b3f406c0, 0xc000d68960, 0xc00126e800)\n\t/go/pkg/mod/github.com/!n!y!times/gziphandler@v1.1.1/gzip.go:336 +0x23f\nnet/http.HandlerFunc.ServeHTTP(0xc000a78270, 0x7fd6b3f406c0, 0xc000d68960, 0xc00126e800)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/thanos-io/thanos/pkg/tracing.HTTPMiddleware.func1(0x7fd6b3f406c0, 0xc000d68960, 0xc00126e700)\n\t/go/src/github.com/thanos-io/thanos/pkg/tracing/http.go:46 +0x549\nnet/http.HandlerFunc.ServeHTTP(0xc0003f7d10, 0x7fd6b3f406c0, 0xc000d68960, 0xc00126e700)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerResponseSize.func1(0x7fd6b3f406c0, 0xc000d68910, 0xc00126e700)\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.5.0/prometheus/promhttp/instrument_server.go:196 +0xe9\nnet/http.HandlerFunc.ServeHTTP(0xc000a785a0, 0x7fd6b3f406c0, 0xc000d68910, 0xc00126e700)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1(0x7fd6b3f406c0, 0xc000d688c0, 0xc00126e700)\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.5.0/prometheus/promhttp/instrument_server.go:100 +0xda\nnet/http.HandlerFunc.ServeHTTP(0xc000a78780, 0x7fd6b3f406c0, 0xc000d688c0, 0xc00126e700)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerRequestSize.func1(0x7fd6b3f406c0, 0xc000d68870, 0xc00126e700)\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.5.0/prometheus/promhttp/instrument_server.go:163 +0xe9\nnet/http.HandlerFunc.ServeHTTP(0xc000a789f0, 0x7fd6b3f406c0, 0xc000d68870, 0xc00126e700)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func1(0x22fe420, 0xc004f2a000, 0xc00126e700)\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.5.0/prometheus/promhttp/instrument_server.go:68 +0x11c\ngithub.com/prometheus/common/route.(*Router).handle.func1(0x22fe420, 0xc004f2a000, 0xc00126e600, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/common@v0.9.1/route/route.go:83 +0x296\ngithub.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc000362660, 0x22fe420, 0xc004f2a000, 0xc00126e600)\n\t/go/pkg/mod/github.com/julienschmidt/httprouter@v1.3.0/router.go:387 +0xa3d\ngithub.com/prometheus/common/route.(*Router).ServeHTTP(0xc000c28960, 0x22fe420, 0xc004f2a000, 0xc00126e600)\n\t/go/pkg/mod/github.com/prometheus/common@v0.9.1/route/route.go:121 +0x4c\nnet/http.(*ServeMux).ServeHTTP(0xc000236940, 0x22fe420, 0xc004f2a000, 0xc00126e600)\n\t/usr/local/go/src/net/http/server.go:2387 +0x1bd\nnet/http.serverHandler.ServeHTTP(0xc0001d40e0, 0x22fe420, 0xc004f2a000, 0xc00126e600)\n\t/usr/local/go/src/net/http/server.go:2802 +0xa4\nnet/http.(*conn).serve(0xc00016c3c0, 0x23066a0, 0xc000e640c0)\n\t/usr/local/go/src/net/http/server.go:1890 +0x875\ncreated by net/http.(*Server).Serve\n\t/usr/local/go/src/net/http/server.go:2927 +0x38e\n"
I couldnt find any error logs on the thanos-store
pods
Hope this helps :)
That's definitely very useful! Would it be possible for you to check this with v0.13.0-rc.0
to see if it is still reproducible?
That's definitely very useful! Would it be possible for you to check this with
v0.13.0-rc.0
to see if it is still reproducible? I also encountered the error of index out of range [-1], but I tried to reduce the number of--query.replica-label parameters
of thethanos-query
.
I did the same as ipyker - I added a new --query.replica-label=foo
, restarted the Thanos Query, and this started occuring.
Luckily when I remove --query.replica-label=foo
it goes away. Is it a problem if we pick a replica-label which does not exist on all our metrics? The label foo
is only on some metrics, not all.
That is super helpful feedback, we will take a closer look tomorrow, thanks! (:
On Sun, 31 May 2020 at 02:12, Sevag Hanssian notifications@github.com wrote:
I did the same as ipyker - removed a previous --query.replica-label, restarted the Thanos Query, and this started occuring.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thanos-io/thanos/issues/2645#issuecomment-636405987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVA3O7PYSEQS77WJWZ633LRUGVHNANCNFSM4NH56CGQ .
Hey @bwplotka , Do we know whats going on yet?
Would setting up a query pod with none of the --query.replica-label=foo
potentially solve our problem?
On it, now
Ok, got it. The problem is that Querier just have problems with more than 2 replica labels. Fixing.
Thanks for spotting this and quickly describing issue :heart:
That makes sense - I have 2 replica labels, and encountered this when adding a 3rd. Thanks for the quick turnaround.
Silly missed edge case fixed here: https://github.com/thanos-io/thanos/pull/2728
Will this make it into 0.13?
Not really, but it is available on master right now (we build images for master). We run the current master on production now, so I would recommend upgrading at least querier if you are interested in this fix. Also next release is pretty soon (3 weeks?)
TL;DR the bug was bit more complex. It was panicking for all requests that return series with less or same number labels vs number of replica labels (:
On Wed, 10 Jun 2020 at 15:40, Sevag Hanssian notifications@github.com wrote:
Will this make it into 0.13?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/thanos-io/thanos/issues/2645#issuecomment-642054667, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVA3O6FN4N3P373E7VSTVTRV6LPTANCNFSM4NH56CGQ .
Thanos Version used: v0.12.2
Object Storage Provider: Ceph S3
What happened: When I run queries against historic dates, for certain dates we get a response
unexpected error: runtime error: index out of range [-1]
{'field': 'subscribed_cores', 'date': datetime.date(2019, 12, 27), 'runtime': 1.5100421905517578, 'text': '{"status":"error","errorType":"execution","error":"unexpected error: runtime error: index out of range [-1]"}\n', 'value': None, 'status': 'error', 'error': 'unexpected error: runtime error: index out of range [-1]'}
Other than that we also sometimes get an empty response if not the error but never the actual value.
{'field': 'subscribed_cores', 'date': datetime.date(2019, 12, 27), 'runtime': 1.227376699447632, 'text': '{"status":"success","data":{"resultType":"vector","result":[]}}\n', 'value': None, 'status': 'success', 'error': None}
What you expected to happen: Get the value for that date.
CC: @andmatt