influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.99k stars 3.56k forks source link

[2.0.2] a panic has occurred - on 1.x query/write endpoints #20121

Closed nsteinmetz closed 4 years ago

nsteinmetz commented 4 years ago

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

  1. Upgrade data from 1.8.3 to 2.0.1 with success
  2. Upgrade deb package to 2.0.2
  3. See huge amounts of log with panic message

Expected behavior:

InfluxDB 1.x endpoints should continue to work as expected

Actual behavior:

Nov 20 18:03:00 crnt-d10-monitoring influxd[23929]: ts=2020-11-20T17:03:00.868085Z lvl=error msg="a panic has occurred" log_id=0Qaea9yG000 handler=panic error="/write?db=comptaonline: runtime error: invalid memory address or nil pointer dereference" stacktrace="goroutine 1910 [running]:\nruntime/debug.Stack(0xc003cc1500, 0xc004fd3d02, 0x272dd42)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x9f\ngithub.com/influxdata/influxdb/v2/http/legacy.baseHandler.panic(0x39be520, 0x3970ea8, 0x39f2820, 0xc00886d380, 0xc009947800, 0x236c820, 0x4b53dc0)\n\t/go/src/github.com/influxdata/influxdb/http/legacy/router.go:67 +0x217\ngithub.com/influxdata/httprouter.(*Router).recv(0xc003ac3200, 0x39f2820, 0xc00886d380, 0xc009947800)\n\t/go/pkg/mod/github.com/influxdata/httprouter@v1.3.1-0.20191122104820-ee83e2772f69/router.go:361 +0x79\npanic(0x236c820, 0x4b53dc0)\n\t/usr/local/go/src/runtime/panic.go:969 +0x1b9\ngithub.com/influxdata/influxdb/v2/dbrp.(*Service).FindMany.func2.1(0xc002e58a4e, 0x10, 0x10, 0x7facee94b39f, 0xa3, 0xa3, 0xa3, 0x0, 0x0)\n\t/go/src/github.com/influxdata/influxdb/dbrp/service.go:269 +0x1ab\ngithub.com/influxdata/influxdb/v2/dbrp.(*Service).FindMany.func3(0x39f1760, 0xc00886d640, 0xee904f, 0xc001346400)\n\t/go/src/github.com/influxdata/influxdb/dbrp/service.go:307 +0x535\ngithub.com/influxdata/influxdb/v2/bolt.(*KVStore).View.func1(0xc003ccf180, 0x0, 0xc003ccf180)\n\t/go/src/github.com/influxdata/influxdb/bolt/kv.go:155 +0x97\ngo.etcd.io/bbolt.(*DB).View(0xc001346400, 0xc0051cef38, 0x0, 0x0)\n\t/go/pkg/mod/go.etcd.io/bbolt@v1.3.5/db.go:725 +0x96\ngithub.com/influxdata/influxdb/v2/bolt.(*KVStore).View(0xc0001a1c00, 0x3a07c60, 0xc005133f80, 0xc00524e2a0, 0x0, 0x0)\n\t/go/src/github.com/influxdata/influxdb/bolt/kv.go:154 +0x115\ngithub.com/influxdata/influxdb/v2/dbrp.(*Service).FindMany(0xc00135f380, 0x3a07c60, 0xc005133ef0, 0x0, 0xc0051f6418, 0x0, 0xc0051c2b50, 0x0, 0xc0051f6440, 0x0, ...)\n\t/go/src/github.com/influxdata/influxdb/dbrp/service.go:277 +0x1a2\ngithub.com/influxdata/influxdb/v2/dbrp.AuthorizedService.FindMany(0x3a102a0, 0xc00135f380, 0x3a07c60, 0xc005133ef0, 0x0, 0xc0051f6418, 0x0, 0xc0051c2b50, 0x0, 0xc0051f6440, ...)\n\t/go/src/github.com/influxdata/influxdb/dbrp/middleware_auth.go:32 +0xb1\ngithub.com/influxdata/influxdb/v2/http/legacy.(*WriteHandler).findMapping(0xc00124b500, 0x3a07c60, 0xc005133ef0, 0x75aa697789630c77, 0xc00b04c2df, 0xc, 0x0, 0x0, 0x2702c84, 0xc00b04c2df, ...)\n\t/go/src/github.com/influxdata/influxdb/http/legacy/write_handler.go:204 +0x188\ngithub.com/influxdata/influxdb/v2/http/legacy.(*WriteHandler).findBucket(0xc00124b500, 0x3a07c60, 0xc005133ef0, 0x75aa697789630c77, 0xc00b04c2df, 0xc, 0x0, 0x0, 0x13ddfb5, 0xc0009e7e70, ...)\n\t/go/src/github.com/influxdata/influxdb/http/legacy/write_handler.go:159 +0x85\ngithub.com/influxdata/influxdb/v2/http/legacy.(*WriteHandler).handleWrite(0xc00124b500, 0x39f2820, 0xc00886d380, 0xc009947900)\n\t/go/src/github.com/influxdata/influxdb/http/legacy/write_handler.go:125 +0x29c\nnet/http.HandlerFunc.ServeHTTP(0xc01e2044f0, 0x39f2820, 0xc00886d380, 0xc009947900)\n\t/usr/local/go/src/net/http/server.go:2042 +0x44\ngithub.com/influxdata/httprouter.(*Router).Handler.func1(0x39f2820, 0xc00886d380, 0xc009947900, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/influxdata/httprouter@v1.3.1-0.20191122104820-ee83e2772f69/router.go:325 +0x1e7\ngithub.com/influxdata/httprouter.(*Router).ServeHTTP(0xc003ac3200, 0x39f2820, 0xc00886d380, 0xc009947800)\n\t/go/pkg/mod/github.com/influxdata/httprouter@v1.3.1-0.20191122104820-ee83e2772f69/router.go:453 +0xa9b\ngithub.com/influxdata/influxdb/v2/http/legacy.(*WriteHandler).ServeHTTP(...)\n\t/go/src/github.com/influxdata/influxdb/http/legacy/write_handler.go:96\ngithub.com/influxdata/influxdb/v2/http/legacy.(*Handler).ServeHTTP(0xc0037cbdd0, 0x39f2820, 0xc00886d380, 0xc009947800)\n\t/go/src/github.com/influxdata/influxdb/http/legacy/backend.go:64 +0x157\ngithub.com/influxdata/influxdb/v2/http/legacy.(*Influx1xAuthenticationHandler).ServeHTTP(0xc003b4c210, 0x39f2820, 0xc00886d380, 0xc009947700)\n\t/go/src/github.com/influxdata/influxdb/http/legacy/influx1x_authentication_handler.go:70 +0x475\ngithub.com/influxdata/influxdb/v2/http.(*PlatformHandler).ServeHTTP(0xc003b4c240, 0x39f2820, 0xc00886d380, 0xc009947700)\n\t/go/src/github.com/influxdata/influxdb/http/platform_handler.go:59 +0x7e\ngithub.com/go-chi/chi.(*Mux).Mount.func1(0x39f2820, 0xc00886d380, 0xc009947700)\n\t/go/pkg/mod/github.com/go-chi/chi@v4.1.0+incompatible/mux.go:298 +0x122\nnet/http.HandlerFunc.ServeHTTP(0xc00cdf1640, 0x39f2820, 0xc00886d380, 0xc009947700)\n\t/usr/local/go/src/net/http/server.go:2042 +0x44\ngithub.com/influxdata/influxdb/v2/kit/transport/http.Metrics.func1.1(0x39faae0, 0xc003cceee0, 0xc009947700)\n\t/go/src/github.com/influxdata/influxdb/kit/transport/http/middleware.go:57 +0x191\nnet/http.HandlerFunc.ServeHTTP(0xc003b52600, 0x39faae0, 0xc003cceee0, 0xc009947700)\n\t/usr/local/go/src/net/http/server.go:2042 +0x44\ngithub.com/influxdata/influxdb/v2/kit/transport/http.Trace.func1.1(0x39faae0, 0xc003cceee0, 0xc009947700)\n\t/go/src/github.com/influxdata/influxdb/kit/transport/http/middleware.go:97 +0x38f\nnet/http.HandlerFunc.ServeHTTP(0xc003b4c750, 0x39faae0, 0xc003cceee0, 0xc009947600)\n\t/usr/local/go/src/net/http/server.go:2042 +0x44\ngithub.com/go-chi/chi.(*ChainHandler).ServeHTTP(0xc003b52640, 0x39faae0, 0xc003cceee0, 0xc009947600)\n\t/go/pkg/mod/github.com/go-chi/chi@v4.1.0+incompatible/chain.go:31 +0x52\ngithub.com/go-chi/chi.(*Mux).routeHTTP(0xc0039076e0, 0x39faae0, 0xc003cceee0, 0xc009947600)\n\t/go/pkg/mod/github.com/go-chi/chi@v4.1.0+incompatible/mux.go:431 +0x28b\nnet/http.HandlerFunc.ServeHTTP(0xc01e204690, 0x39faae0, 0xc003cceee0, 0xc009947600)\n\t/usr/local/go/src/net/http/server.go:2042 +0x44\ngithub.com/go-chi/chi.(*Mux).ServeHTTP(0xc0039076e0, 0x39faae0, 0xc003cceee0, 0xc009947500)\n\t/go/pkg/mod/github.com/go-chi/chi@v4.1.0+incompatible/mux.go:86 +0x2d1\ngithub.com/influxdata/influxdb/v2/http.(*Handler).ServeHTTP(0xc003acbfc0, 0x39faae0, 0xc003cceee0, 0xc009947500)\n\t/go/src/github.com/influxdata/influxdb/http/handler.go:143 +0x55\nnet/http.serverHandler.ServeHTTP(0xc000956540, 0x39faae0, 0xc003cceee0, 0xc009947500)\n\t/usr/local/go/src/net/http/server.go:2843 +0xa3\nnet/http.(*conn).serve(0xc004e1d2c0, 0x3a07ba0, 0xc0052a0100)\n\t/usr/local/go/src/net/http/server.go:1925 +0x8ad\ncreated by net/http.(*Server).Serve\n\t/usr/local/go/src/net/http/server.go:2969 +0x36c\n"

and for query:

Nov 20 18:03:01 crnt-d10-monitoring influxd[23929]: ts=2020-11-20T17:03:01.176757Z lvl=error msg="SELECT mean(free) FROM crntgitlab..disk WHERE (path = '/' AND host = 'crnt-d10-gitlab') AND time > now() - 5m GROUP BY time(200ms) [panic:runtime error: invalid memory address or nil pointer dereference] goroutine 1934 [running]:\nruntime/debug.Stack(0xc0053d65e0, 0x1, 0x1)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x9f\ngithub.com/influxdata/influxdb/v2/influxql/query.(*Executor).recover(0xc0295afe00, 0xc008cf8ee0, 0xc00a4d16e0)\n\t/go/src/github.com/influxdata/influxdb/influxql/query/executor.go:345 +0xad\npanic(0x236c820, 0x4b53dc0)\n\t/usr/local/go/src/runtime/panic.go:969 +0x1b9\ngithub.com/influxdata/influxdb/v2/dbrp.(*Service).FindMany.func2.1(0xc0086aa232, 0x10, 0x10, 0x7facee94b18f, 0xa1, 0xa1, 0x1, 0x0, 0x0)\n\t/go/src/github.com/influxdata/influxdb/dbrp/service.go:269 +0x1ab\ngithub.com/influxdata/influxdb/v2/kv.indexWalk(0x3a07c60, 0xc0054286c0, 0x39f1720, 0xc005385320, 0x3a1b5e0, 0xc011f3dfe8, 0xc00524eb40, 0x39f1720, 0xc005385320)\n\t/go/src/github.com/influxdata/influxdb/kv/index.go:225 +0x2da\ngithub.com/influxdata/influxdb/v2/kv.(*Index).Walk(0xc00ce55400, 0x3a07c60, 0xc0054286c0, 0x39f1760, 0xc008cf9060, 0xc005250860, 0x1a, 0x1a, 0xc00524eb40, 0x0, ...)\n\t/go/src/github.com/influxdata/influxdb/kv/index.go:198 +0x218\ngithub.com/influxdata/influxdb/v2/dbrp.(*Service).FindMany.func3(0x39f1760, 0xc008cf9060, 0xee904f, 0xc001346400)\n\t/go/src/github.com/influxdata/influxdb/dbrp/service.go:311 +0x19f\ngithub.com/influxdata/influxdb/v2/bolt.(*KVStore).View.func1(0xc003ccfea0, 0x0, 0xc003ccfea0)\n\t/go/src/github.com/influxdata/influxdb/bolt/kv.go:155 +0x97\ngo.etcd.io/bbolt.(*DB).View(0xc001346400, 0xc02967f6c8, 0x0, 0x0)\n\t/go/pkg/mod/go.etcd.io/bbolt@v1.3.5/db.go:725 +0x96\ngithub.com/influxdata/influxdb/v2/bolt.(*KVStore).View(0xc0001a1c00, 0x3a07c60, 0xc005428720, 0xc00524eae0, 0x0, 0x0)\n\t/go/src/github.com/influxdata/influxdb/bolt/kv.go:154 +0x115\ngithub.com/influxdata/influxdb/v2/dbrp.(*Service).FindMany(0xc00135f380, 0x3a07c60, 0xc0054286c0, 0x0, 0xc001ab5518, 0x0, 0xc00524c410, 0x0, 0x0, 0x0, ...)\n\t/go/src/github.com/influxdata/influxdb/dbrp/service.go:277 +0x1a2\ngithub.com/influxdata/influxdb/v2/dbrp.AuthorizedService.FindMany(0x3a102a0, 0xc00135f380, 0x3a07c60, 0xc0054286c0, 0x0, 0xc001ab5518, 0x0, 0xc00524c410, 0x0, 0x0, ...)\n\t/go/src/github.com/influxdata/influxdb/dbrp/middleware_auth.go:32 +0xb1\ngithub.com/influxdata/influxdb/v2/v1/coordinator.(*StatementExecutor).normalizeMeasurement(0xc0295bc9c0, 0x3a07c60, 0xc0054286c0, 0xc00524c410, 0xc001649b3e, 0xa, 0x0, 0x0, 0xc001ab5500, 0x1, ...)\n\t/go/src/github.com/influxdata/influxdb/v1/coordinator/statement_executor.go:722 +0x11e\ngithub.com/influxdata/influxdb/v2/v1/coordinator.(*StatementExecutor).NormalizeStatement.func1(0x39e3f20, 0xc00524c410)\n\t/go/src/github.com/influxdata/influxdb/v1/coordinator/statement_executor.go:692 +0x21f\ngithub.com/influxdata/influxql.walkFuncVisitor.Visit(0xc00524ea80, 0x39e3f20, 0xc00524c410, 0x40b85f, 0xc00003c000)\n\t/go/pkg/mod/github.com/influxdata/influxql@v0.0.0-20180925231337-1cbfca8e56b6/ast.go:3953 +0x3a\ngithub.com/influxdata/influxql.Walk(0x39be880, 0xc00524ea80, 0x39e3f20, 0xc00524c410)\n\t/go/pkg/mod/github.com/influxdata/influxql@v0.0.0-20180925231337-1cbfca8e56b6/ast.go:3825 +0x75\ngithub.com/influxdata/influxql.Walk(0x39be880, 0xc00524ea80, 0x39e7f20, 0xc008cf8fe0)\n\t/go/pkg/mod/github.com/influxdata/influxql@v0.0.0-20180925231337-1cbfca8e56b6/ast.go:3928 +0x1a5\ngithub.com/influxdata/influxql.Walk(0x39be880, 0xc00524ea80, 0x39e4020, 0xc003aa1d00)\n\t/go/pkg/mod/github.com/influxdata/influxql@v0.0.0-20180925231337-1cbfca8e56b6/ast.go:3879 +0x60f\ngithub.com/influxdata/influxql.WalkFunc(...)\n\t/go/pkg/mod/github.com/influxdata/influxql@v0.0.0-20180925231337-1cbfca8e56b6/ast.go:3948\ngithub.com/influxdata/influxdb/v2/v1/coordinator.(*StatementExecutor).NormalizeStatement(0xc0295bc9c0, 0x3a07c60, 0xc0054286c0, 0x3a0aba0, 0xc003aa1d00, 0xc001649b3e, 0xa, 0x0, 0x0, 0xc001ab5500, ...)\n\t/go/src/github.com/influxdata/influxdb/v1/coordinator/statement_executor.go:657 +0x16e\ngithub.com/influxdata/influxdb/v2/influxql/query.(*Executor).executeQuery(0xc0295afe00, 0x3a07c60, 0xc0054286c0, 0xc008cf8ee0, 0x75aa697789630c77, 0xc001649b3e, 0xa, 0x0, 0x0, 0x3a0b860, ...)\n\t/go/src/github.com/influxdata/influxdb/influxql/query/executor.go:275 +0x3a3\ncreated by github.com/influxdata/influxdb/v2/influxql/query.(*Executor).ExecuteQuery\n\t/go/src/github.com/influxdata/influxdb/influxql/query/executor.go:196 +0x110\n" log_id=0Qae_Ci0000 service=query

Environment info:

Config:

# /etc/default/influxdb 
INFLUXD_CONFIG_PATH=/etc/influxdb/config.toml
# /etc/influxdb/config.toml 
bolt-path = "/srv/influx/influxdb2/influxd.bolt"
engine-path = "/srv/influx/influxdb2/engine"
http-bind-address = "127.0.0.1:8086"
storage-series-id-set-cache-size = 100
nsteinmetz commented 4 years ago

Weird thing I see in logs for the query :

FROM crntgitlab..disk

why 2 points ?

nsteinmetz commented 4 years ago

Seems the autogen is missing

2.0.1, I can see:

FROM crntbackup.autogen.disk
or
FROM comptaonline.\"default\".disk
timhallinflux commented 4 years ago

queries like FROM crntgitlab..disk are syntactically correct. That says... use the default retention policy (RP) as opposed to typing it out. Similarly, the default RP can be leveraged for writes...if it is NOT specified by the writer (i.e. the writer only specifies the DB).

docmerlin commented 4 years ago

@nsteinmetz can you run from 2.0.2 influx v1 dbrp list --db=comptaonline --default=true If that doesn't return anything can you please run for me: influx v1 dbrp list --db=comptaonline

timhallinflux commented 4 years ago

Looks like the upgrade step did not create the appropriate DBRP mappings.

nsteinmetz commented 4 years ago

As seen on slack:

With a influxdb server running 2.0.1 version but using 2.0.2 binaries for command below, get all dbrp policies:

First, get your list of db with influx bucket list

for db in <list of db from above - limiting to 1.x db>; do
    echo $db;
    influx v1 dbrp list --db=<db> --default=true
done

ID          Database    Bucket ID   Retention Policy    Default Organization ID
06a59835c0505000    <db>            default         true    
[...]

You will notice that bucket id is null.

Delete the dbrp:

for id in <list of dbrp ids from above>; do echo ${id}; influx v1 dbrp delete --id=${id}; done

Then stop influxdb server with 2.0.1 and start server with 2.0.2 binaries.

From now, you can create new dbrp ; db and bucket-id used below are the ones from influx bucket list

influx v1 dbrp create --bucket-id=<bucket-id> --db=<db> --rp=autogen --default=true

If you do again the command below, you will see a bucket id for each db.

for db in <list of db from above - limiting to 1.x db>; do
    echo $db;
    influx v1 dbrp list --db=<db> --default=true
done

From now, I no longer have panic messages.

To be noticed:

Otherwise you will have this error as output ; I guess it's due to object constraints in the doc and the bucket id beeing null.

Error: Attempted to unmarshal error as JSON but failed: "unexpected end of JSON input":.

Thnaks @timhallinflux @docmerlin for your kind and prompt support !

docmerlin commented 4 years ago

OK, the problem with the empty buckets was fixed in 2.0.2.

docmerlin commented 4 years ago

The problem was the influxd upgrade command was making DBRPs with empty bucket IDs. This was fixed in 2.0.2.

If you get caught with this problem https://github.com/influxdata/influxdb/issues/20121#issuecomment-731403423 Is how to get yourself in a better state.

As this appears to be resolved, I am closing this issue.