influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.67k stars 3.54k forks source link

Influx 2 backup fails with panic #21372

Open savujevi opened 3 years ago

savujevi commented 3 years ago

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

  1. run influxd in dockercontainer in openshift
  2. has a lot of data (problem does not occure with small data)
  3. influx backup /some-path

Expected behavior: Backup should succeed

Actual behavior: `h-4.2$ influx backup /backup/test 2021-05-05T09:28:27.556116Z info Backing up KV store {"log_id": "0TvyZKdl000", "path": "/backup/test/20210505T092827Z.bolt"} panic: invalid freelist page: 0, page type is unknown<00>

goroutine 1 [running]: go.etcd.io/bbolt.(freelist).read(0xc000433d00, 0x7fea4a079000) /home/circleci/go/pkg/mod/go.etcd.io/bbolt@v1.3.5/freelist.go:266 +0x30b go.etcd.io/bbolt.(DB).loadFreelist.func1() /home/circleci/go/pkg/mod/go.etcd.io/bbolt@v1.3.5/db.go:316 +0xd4 sync.(Once).doSlow(0xc0001ceb68, 0xc00035f468) /usr/local/go/src/sync/once.go:66 +0xee sync.(Once).Do(...) /usr/local/go/src/sync/once.go:57 go.etcd.io/bbolt.(DB).loadFreelist(0xc0001cea00) /home/circleci/go/pkg/mod/go.etcd.io/bbolt@v1.3.5/db.go:309 +0x6c go.etcd.io/bbolt.Open(0xc00011b9b0, 0x22, 0x180, 0xc00035f5b8, 0x40f7ba, 0x30, 0x19c3c00) /home/circleci/go/pkg/mod/go.etcd.io/bbolt@v1.3.5/db.go:286 +0x38f github.com/influxdata/influxdb/v2/bolt.(KVStore).openDB(0xc0003dfd40, 0xc0004b2960, 0x1af8040) /home/circleci/go/src/github.com/influxdata/influxdb/bolt/kv.go:93 +0x8b github.com/influxdata/influxdb/v2/bolt.(KVStore).Open(0xc0003dfd40, 0x1b1ea00, 0xc00011c000, 0x0, 0x0) /home/circleci/go/src/github.com/influxdata/influxdb/bolt/kv.go:84 +0x34f github.com/influxdata/influxdb/v2/backup.RunBackup(0x1b1ea00, 0xc00011c000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7ffc1e5ac479, 0xc, ...) /home/circleci/go/src/github.com/influxdata/influxdb/backup/backup.go:69 +0x4bb main.(cmdBackupBuilder).backupRunE(0xc000401540, 0xc0003fb8c0, 0xc0004a4740, 0x1, 0x1, 0x0, 0x0) /home/circleci/go/src/github.com/influxdata/influxdb/cmd/influx/backup.go:98 +0x33d main.checkSetupRunEMiddleware.func1.1(0xc0003fb8c0, 0xc0004a4740, 0x1, 0x1, 0x0, 0x0) /home/circleci/go/src/github.com/influxdata/influxdb/cmd/influx/main.go:466 +0x76 github.com/spf13/cobra.(Command).execute(0xc0003fb8c0, 0xc0004a4720, 0x1, 0x1, 0xc0003fb8c0, 0xc0004a4720) /home/circleci/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 +0x47c github.com/spf13/cobra.(Command).ExecuteC(0xc000395600, 0x0, 0x0, 0xc000395600) /home/circleci/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 +0x375 github.com/spf13/cobra.(*Command).Execute(...) /home/circleci/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887 main.main() /home/circleci/go/src/github.com/influxdata/influxdb/cmd/influx/main.go:43 ####+0x58`

Environment info: OpenShift 3.11 InfluxDB 2.0.5 (git: 741389781e) build_date: 2021-04-27T17:57:07Z Linux 3.10.0-1160.24.1.el7.x86_64 x86_64 (rhel 7)

Config: Copy any non-default config values here or attach the full config as a gist or file.

Logs: Include snippet of errors in log.

Performance: Generate profiles with the following commands for bugs related to performance, locking, out of memory (OOM), etc.

# Commands should be run when the bug is actively happening.
# Note: This command will run for ~30 seconds.
curl -o profiles.tar.gz "http://localhost:8086/debug/pprof/all?cpu=30s"
iostat -xd 1 30 > iostat.txt
# Attach the `profiles.tar.gz` and `iostat.txt` output files.
lesam commented 3 years ago

Looks related to https://github.com/etcd-io/bbolt/issues/135 bbolt corruption.

@savujevi how much data is 'a lot'? Can you provide the /backup/test/20210505T092827Z.bolt file that appears to be corrupted? Can you give any more details about the structure of your data and especially metadata (how many buckets, how the data is created, how many tasks, how many users... generally what this database is doing) that might help investigation or help us to reproduce it?

lesam commented 3 years ago

Also the actual server-side boltdb would be extremely valuable - see https://docs.influxdata.com/influxdb/v2.0/reference/config-options/#bolt-path for where to find your boltdb