influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.84k stars 3.55k forks source link

influx_inspect buildtsi -compact-series-file -> unexpected fault address 0xa4ba5000 #17866

Open mt-mrx opened 4 years ago

mt-mrx commented 4 years ago

Hi,

I upgraded my influxdb:1.7.10 docker container to 1.8.0 and migrated to INFLUXDB_DATA_INDEX_VERSION=tsi1, which was successful. I'm not sure if the upgrade has something to do with the segfault of influx_inspect so I'm adding my upgrade steps as well.

Afterwards I wanted to use "influx_inspect buildtsi -compact-series-file ..." but it crashes with a "fatal error: fault" I tried starting the container with those data files but the container just loops and will not start.

Environment info: My docker environment runs on Raspberry Pi 4 with the data stored on an external SSD.

root@andromeda:~ # uname -srm
Linux 4.19.97-v7l+ armv7l

When influxdb container is running there is enough free memory available.

root@andromeda:~ # free -m
              total        used        free      shared  buff/cache   available
Mem:           3906        1055          95          34        2755        2695
Swap:          8291         229        8062

Steps to reproduce:

Successful upgrade 1.7.10 to 1.8.0 steps I read https://docs.influxdata.com/influxdb/v1.8/administration/upgrading/ and decided to upgrade my container from 1.7.10 to 1.8.0 and also use tsi1 index.

My starting docker-compose definition was:

    influxdb:
        container_name: mnetcontrol-influxdb
        hostname: mnetcontrol-influxdb
        restart: "always"
        expose:
            - "8086"
        image: influxdb:1.7.10
        volumes:
            - ./influxdb/var/lib/influxdb:/var/lib/influxdb
        environment:
            TZ: "Europe/Berlin"
            INFLUXDB_HTTP_AUTH_ENABLED: "true"
            INFLUXDB_ADMIN_USER: "admin"
            INFLUXDB_ADMIN_PASSWORD: "${INFLUXDB_ADMIN_PASSWORD}"
        networks:
            - mnetcontrol

I had no index files as mentioned in 3c) of the upgrade instructions So my understanding was that I didn't need to prepare the data somehow because I had no index files.

root@mnetcontrol-influxdb:/# find /var/lib/influxdb/ | grep -i index
root@mnetcontrol-influxdb:/#

My config file/container environment variables contained no index-version setting as mentioned in 3a)

root@mnetcontrol-influxdb:/# cat /etc/influxdb/influxdb.conf
[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  engine = "tsm1"
  wal-dir = "/var/lib/influxdb/wal"
root@mnetcontrol-influxdb:/#

I then stopped the container and backuped the data files

root@andromeda:/opt/docker/mnetcontrol # docker container stop mnetcontrol-influxdb
root@andromeda:/opt/docker/mnetcontrol # cp -a influxdb backups/influxdb_2020-04-24_1.7.10

Started a temporary container to build the index

root@andromeda:/opt/docker/mnetcontrol # docker run --rm \
  -it \
  -e INFLUXDB_DATA_INDEX_VERSION=tsi1 \
  --entrypoint /bin/bash \
  -v /opt/docker/mnetcontrol/influxdb/var/lib/influxdb:/var/lib/influxdb \
  -p 8086 \
  influxdb:1.8.0
Unable to find image 'influxdb:1.8.0' locally
1.8.0: Pulling from library/influxdb
4168e46f368a: Pull complete
...

Ran inside the container the "influx_inspect buildtsi ..." which ran fine reported only "info" messages and as far as I can tell reported no errors.

root@159f1033c05c:/# influx_inspect buildtsi -waldir /var/lib/influxdb/wal -datadir /var/lib/influxdb/data

Modified docker-compose definition from above with the following values

root@andromeda:/opt/docker/mnetcontrol # grep -A10 influxdb docker-compose.yml
...
    influxdb:
        #image: influxdb:1.7.10 # since 2020-04-04
        image: influxdb:1.8.0 # since 2020-04-24
        environment:
            # since version 1.8.0 index tsi1 is recommend
            INFLUXDB_DATA_INDEX_VERSION: "tsi1"
...

Started container

root@andromeda:/opt/docker/mnetcontrol# docker-compose up -d influxdb

Checked my grafana dashboards afterwards and the data from last year for all sensors looked fine.

Failed steps to compact-series in 1.8.0 Stopped the 1.8.0 container

root@andromeda:/opt/docker/mnetcontrol # docker container stop mnetcontrol-influxdb

Created a temporary container with influxdb and access to the data files

root@andromeda:/opt/docker/mnetcontrol # docker run --rm \
>   -it \
>   -e INFLUXDB_DATA_INDEX_VERSION=tsi1 \
>   --entrypoint /bin/bash \
>   -v /opt/docker/mnetcontrol/influxdb/var/lib/influxdb:/var/lib/influxdb \
>   -p 8086 \
>   influxdb:1.8.0

Ran the compact-series command as described here: https://docs.influxdata.com/influxdb/v1.8/administration/compact-series-file/ And you can see the result below

root@134ce7be7115:/# influx_inspect buildtsi -compact-series-file -waldir /var/lib/influxdb/wal -datadir /var/lib/influxdb/data
You are currently running as root. This will build your
index files with root ownership and will be inaccessible
if you run influxd as a non-root user. You should run
buildtsi as the same user you are running influxd.
Are you sure you want to continue? (y/N): y
processing partition for "/var/lib/influxdb/data/_internal/_series/01"
processing partition for "/var/lib/influxdb/data/_internal/_series/00"
processing partition for "/var/lib/influxdb/data/_internal/_series/02"
processing partition for "/var/lib/influxdb/data/_internal/_series/03"
processing segment "/var/lib/influxdb/data/_internal/_series/01/0000" 0
processing segment "/var/lib/influxdb/data/_internal/_series/00/0000" 0
processing segment "/var/lib/influxdb/data/_internal/_series/02/0000" 0
processing segment "/var/lib/influxdb/data/_internal/_series/03/0000" 0
renaming new segment "/var/lib/influxdb/data/_internal/_series/00/0000.tmp" to "/var/lib/influxdb/data/_internal/_series/00/0000"
removing index file /var/lib/influxdb/data/_internal/_series/00/index
processing partition for "/var/lib/influxdb/data/_internal/_series/04"
renaming new segment "/var/lib/influxdb/data/_internal/_series/01/0000.tmp" to "/var/lib/influxdb/data/_internal/_series/01/0000"
renaming new segment "/var/lib/influxdb/data/_internal/_series/03/0000.tmp" to "/var/lib/influxdb/data/_internal/_series/03/0000"
removing index file /var/lib/influxdb/data/_internal/_series/01/index
processing partition for "/var/lib/influxdb/data/_internal/_series/05"
removing index file /var/lib/influxdb/data/_internal/_series/03/index
processing partition for "/var/lib/influxdb/data/_internal/_series/06"
renaming new segment "/var/lib/influxdb/data/_internal/_series/02/0000.tmp" to "/var/lib/influxdb/data/_internal/_series/02/0000"
removing index file /var/lib/influxdb/data/_internal/_series/02/index
processing partition for "/var/lib/influxdb/data/_internal/_series/07"
processing segment "/var/lib/influxdb/data/_internal/_series/04/0000" 0
processing segment "/var/lib/influxdb/data/_internal/_series/06/0000" 0
processing segment "/var/lib/influxdb/data/_internal/_series/05/0000" 0
renaming new segment "/var/lib/influxdb/data/_internal/_series/06/0000.tmp" to "/var/lib/influxdb/data/_internal/_series/06/0000"
processing segment "/var/lib/influxdb/data/_internal/_series/07/0000" 0
removing index file /var/lib/influxdb/data/_internal/_series/06/index
renaming new segment "/var/lib/influxdb/data/_internal/_series/05/0000.tmp" to "/var/lib/influxdb/data/_internal/_series/05/0000"
renaming new segment "/var/lib/influxdb/data/_internal/_series/04/0000.tmp" to "/var/lib/influxdb/data/_internal/_series/04/0000"
removing index file /var/lib/influxdb/data/_internal/_series/05/index
removing index file /var/lib/influxdb/data/_internal/_series/04/index
renaming new segment "/var/lib/influxdb/data/_internal/_series/07/0000.tmp" to "/var/lib/influxdb/data/_internal/_series/07/0000"
removing index file /var/lib/influxdb/data/_internal/_series/07/index
unexpected fault address 0xa4ba5000
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0xa4ba5000 pc=0x48814c]

goroutine 1 [running]:
runtime.throw(0x6ae788, 0x5)
    /usr/local/go/src/runtime/panic.go:774 +0x5c fp=0x108aa84 sp=0x108aa70 pc=0x3f948
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:391 +0x378 fp=0x108aa9c sp=0x108aa84 pc=0x55220
github.com/influxdata/influxdb/tsdb.ReadSeriesEntry(0xa4ba5000, 0x3f7000, 0x3f7000, 0x8f12, 0x0, 0xa4ba4f1b, 0xe5, 0x3f70e5, 0x0, 0x0, ...)
    /go/src/github.com/influxdata/influxdb/tsdb/series_segment.go:417 +0x20 fp=0x108aad4 sp=0x108aaa0 pc=0x48814c
github.com/influxdata/influxdb/tsdb.(*SeriesSegment).ForEachEntry(0x10742a0, 0x108ab28, 0x2, 0x0)
    /go/src/github.com/influxdata/influxdb/tsdb/series_segment.go:244 +0x64 fp=0x108ab0c sp=0x108aad4 pc=0x4877c8
github.com/influxdata/influxdb/tsdb.(*SeriesSegment).MaxSeriesID(0x10742a0, 0x0, 0x0)
    /go/src/github.com/influxdata/influxdb/tsdb/series_segment.go:232 +0x50 fp=0x108ab30 sp=0x108ab0c pc=0x48773c
github.com/influxdata/influxdb/tsdb.(*SeriesPartition).openSegments(0x1076070, 0x0, 0x0)
    /go/src/github.com/influxdata/influxdb/tsdb/series_partition.go:124 +0x2a4 fp=0x108aba4 sp=0x108ab30 pc=0x4838b4
github.com/influxdata/influxdb/tsdb.(*SeriesPartition).Open.func1(0x1076070, 0x2b, 0x1ff)
    /go/src/github.com/influxdata/influxdb/tsdb/series_partition.go:78 +0x1c fp=0x108abd8 sp=0x108aba4 pc=0x48c3cc
github.com/influxdata/influxdb/tsdb.(*SeriesPartition).Open(0x1076070, 0x1074240, 0x1)
    /go/src/github.com/influxdata/influxdb/tsdb/series_partition.go:95 +0x90 fp=0x108ac00 sp=0x108abd8 pc=0x483568
github.com/influxdata/influxdb/tsdb.(*SeriesFile).Open(0x1238380, 0x0, 0x0)
    /go/src/github.com/influxdata/influxdb/tsdb/series_file.go:91 +0x260 fp=0x108adb8 sp=0x108ac00 pc=0x47ecf8
github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi.(*Command).compactDatabaseSeriesFile(0x1026d20, 0x10f6097, 0x9, 0x10f60a0, 0x20, 0x0, 0x0)
    /go/src/github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi/buildtsi.go:168 +0x1ec fp=0x108ae38 sp=0x108adb8 pc=0x587840
github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi.(*Command).run(0x1026d20, 0xbeab1eff, 0x16, 0xbeab1ee0, 0x15, 0x0, 0x7)
    /go/src/github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi/buildtsi.go:120 +0x1d8 fp=0x108aec8 sp=0x108ae38 pc=0x5871a8
github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi.(*Command).Run(0x1026d20, 0x1024150, 0x5, 0x6, 0x8, 0x1024150)
    /go/src/github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi/buildtsi.go:81 +0x4b0 fp=0x108aef8 sp=0x108aec8 pc=0x586edc
main.(*Main).Run(0x108af88, 0x1024150, 0x5, 0x6, 0x0, 0x14f01)
    /go/src/github.com/influxdata/influxdb/cmd/influx_inspect/main.go:93 +0xc40 fp=0x108af58 sp=0x108aef8 pc=0x5ae560
main.main()
    /go/src/github.com/influxdata/influxdb/cmd/influx_inspect/main.go:28 +0x110 fp=0x108afa4 sp=0x108af58 pc=0x5ad84c
runtime.main()
    /usr/local/go/src/runtime/proc.go:203 +0x208 fp=0x108afe4 sp=0x108afa4 pc=0x41c28
runtime.goexit()
    /usr/local/go/src/runtime/asm_arm.s:868 +0x4 fp=0x108afe4 sp=0x108afe4 pc=0x6dd38
sofixa commented 4 years ago

FYI i had a similar issue but it was due to a corrupted database (after the system spent a few hours with 100% disk usage):

influx_inspect buildtsi gives the precise database which is corrupt:

2020-09-15T09:10:55.745251Z     info    Rebuilding database     {"log_id": "0PGGLcyW000", "name": "gitlab"}
unexpected fault address 0x7f635f6a9071
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x7f635f6a9071 pc=0x90ac55]

goroutine 1 [running]:
runtime.throw(0xbc40cc, 0x5)
        /usr/local/go/src/runtime/panic.go:774 +0x72 fp=0xc000113468 sp=0xc000113438 pc=0x42f8a2
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:391 +0x455 fp=0xc000113498 sp=0xc000113468 pc=0x443e85
github.com/influxdata/influxdb/tsdb.ReadSeriesEntry(0x7f635f6a9071, 0x155f8f, 0x155f8f, 0x7f635f6a8fdf, 0x92, 0x156021, 0x0, 0x0, 0x9b)
        /go/src/github.com/influxdata/influxdb/tsdb/series_segment.go:417 +0x35 fp=0xc0001134f8 sp=0xc000113498 pc=0x90ac55
github.com/influxdata/influxdb/tsdb.(*SeriesSegment).ForEachEntry(0xc0002c00a0, 0xc000113588, 0x1, 0x0)
        /go/src/github.com/influxdata/influxdb/tsdb/series_segment.go:244 +0x8b fp=0xc000113560 sp=0xc0001134f8 pc=0x90a11b
github.com/influxdata/influxdb/tsdb.(*SeriesSegment).MaxSeriesID(0xc0002c00a0, 0x5)
        /go/src/github.com/influxdata/influxdb/tsdb/series_segment.go:232 +0x5c fp=0xc0001135a8 sp=0xc000113560 pc=0x90a06c
github.com/influxdata/influxdb/tsdb.(*SeriesPartition).openSegments(0xc0003b2210, 0x0, 0xc0001136f0)
        /go/src/github.com/influxdata/influxdb/tsdb/series_partition.go:124 +0x150 fp=0xc000113690 sp=0xc0001135a8 pc=0x905b90
github.com/influxdata/influxdb/tsdb.(*SeriesPartition).Open.func1(0xc0003b2210, 0x26, 0x1ff)
        /go/src/github.com/influxdata/influxdb/tsdb/series_partition.go:78 +0x2f fp=0xc000113700 sp=0xc000113690 pc=0x9101af
github.com/influxdata/influxdb/tsdb.(*SeriesPartition).Open(0xc0003b2210, 0xc000316380, 0x1)
        /go/src/github.com/influxdata/influxdb/tsdb/series_partition.go:95 +0xa7 fp=0xc000113758 sp=0xc000113700 pc=0x905987
github.com/influxdata/influxdb/tsdb.(*SeriesFile).Open(0xc0002c0000, 0x0, 0x0)
        /go/src/github.com/influxdata/influxdb/tsdb/series_file.go:91 +0x367 fp=0xc000113a78 sp=0xc000113758 pc=0x9006b7
github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi.(*Command).processDatabase(0xc000116d80, 0xc00003a8b5, 0x6, 0xc0003101a0, 0x1b, 0xc0003101c0, 0x1a, 0x0, 0x0)
        /go/src/github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi/buildtsi.go:257 +0x250 fp=0xc000113c68 sp=0xc000113a78 pc=0xa29a50
github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi.(*Command).run(0xc000116d80, 0x7ffd3bb9a83b, 0x14, 0x7ffd3bb9a858, 0x13, 0x0, 0x7)
        /go/src/github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi/buildtsi.go:126 +0x3bb fp=0xc000113d90 sp=0xc000113c68 pc=0xa280db
github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi.(*Command).Run(0xc000116d80, 0xc000032200, 0x4, 0x4, 0x8, 0xc000032200)
        /go/src/github.com/influxdata/influxdb/cmd/influx_inspect/buildtsi/buildtsi.go:81 +0x55c fp=0xc000113df8 sp=0xc000113d90 pc=0xa27c8c
main.(*Main).Run(0xc000113f18, 0xc000032200, 0x4, 0x4, 0x0, 0x4)
        /go/src/github.com/influxdata/influxdb/cmd/influx_inspect/main.go:93 +0xdc5 fp=0xc000113ec0 sp=0xc000113df8 pc=0xa56385
main.main()
        /go/src/github.com/influxdata/influxdb/cmd/influx_inspect/main.go:28 +0x151 fp=0xc000113f60 sp=0xc000113ec0 pc=0xa554f1
runtime.main()
        /usr/local/go/src/runtime/proc.go:203 +0x21e fp=0xc000113fe0 sp=0xc000113f60 pc=0x43123e
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc000113fe8 sp=0xc000113fe0 pc=0x45bd51
fl4p commented 1 year ago

I had similar problem with a huge database taking all free disk memory. After freeing up space, influxd and influx_inspect both had the unexpected fault address panic. I deleted _series folders of the corrupted databases and then run influx_inspect buildtsi to rebuild the index and it works again. 🎉