influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.71k stars 3.54k forks source link

InfluxDB getting restarted frequently on v1.7.9 with segmentation fault #24707

Closed Anmol-Porwal18 closed 2 months ago

Anmol-Porwal18 commented 7 months ago

InfluxDB is frequently getting restarted due to segmentation fault.

Environment info:

Logs:

influxd[27351]: ts=2024-02-27T03:20:08.429986Z lvl=info msg="Full compaction complete" log_id=0n_OznJG000 index=tsi tsi1_partition=8 trace_id=0nb94CVG000 op_name=tsi1_compact_to_level tsi1_level=2 path=/var/lib/influxdb/data/telegraf/measurements/44101/index/7/L2-00000011.tsi elapsed=880ms bytes=9201143 kb_per_sec=10210
influxd[27351]: ts=2024-02-27T03:20:08.430010Z lvl=info msg="Removing index file" log_id=0n_OznJG000 index=tsi tsi1_partition=8 trace_id=0nb94CVG000 op_name=tsi1_compact_to_level tsi1_level=2 path=/var/lib/influxdb/data/telegraf/measurements/44101/index/7/L1-00000009.tsi
influxd[27351]: ts=2024-02-27T03:20:08.431343Z lvl=info msg="Removing index file" log_id=0n_OznJG000 index=tsi tsi1_partition=8 trace_id=0nb94CVG000 op_name=tsi1_compact_to_level tsi1_level=2 path=/var/lib/influxdb/data/telegraf/measurements/44101/index/7/L1-00000006.tsi
influxd[27351]: ts=2024-02-27T03:20:08.432624Z lvl=info msg="TSI level compaction (end)" log_id=0n_OznJG000 index=tsi tsi1_partition=8 trace_id=0nb94CVG000 op_name=tsi1_compact_to_level tsi1_level=2 op_event=end op_elapsed=882.691ms
influxd[27351]: ts=2024-02-27T03:20:08.433600Z lvl=info msg="Executing query" log_id=0n_OznJG000 service=query query="SELECT sum(value) FROM measurement WHERE <conditions> GROUP BY <column>"
influxd[27351]: ts=2024-02-27T03:20:08.438136Z lvl=info msg="Snapshot for path written" log_id=0n_OznJG000 engine=tsm1 trace_id=0nb947Sl000 op_name=tsm1_cache_snapshot path=/var/lib/influxdb/data/telegraf/measurements/44101 duration=2178.316ms
influxd[27351]: ts=2024-02-27T03:20:08.438168Z lvl=info msg="Cache snapshot (end)" log_id=0n_OznJG000 engine=tsm1 trace_id=0nb947Sl000 op_name=tsm1_cache_snapshot op_event=end op_elapsed=2178.346ms
influxd[27351]: unexpected fault address 0x7f60b5dab63d
influxd[27351]: fatal error: fault
influxd[27351]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x7f60b5dab63d pc=0x9d3f7e]
influxd[27351]: goroutine 5078661668 [running]:
influxd[27351]: runtime.throw(0x151e28d, 0x5)
influxd[27351]: #011/usr/local/go/src/runtime/panic.go:617 +0x72 fp=0xc029af9a70 sp=0xc029af9a40 pc=0x42f482
influxd[27351]: runtime.sigpanic()
influxd[27351]: #011/usr/local/go/src/runtime/signal_unix.go:397 +0x401 fp=0xc029af9aa0 sp=0xc029af9a70 pc=0x444731
influxd[27351]: github.com/influxdata/influxdb/vendor/github.com/influxdata/roaring.(*shortIterator).next(0xc99bca3720, 0x56)
influxd[27351]: #011/go/src/github.com/influxdata/influxdb/vendor/github.com/influxdata/roaring/shortiterator.go:18 +0x1e fp=0xc029af9ab0 sp=0xc029af9aa0 pc=0x9d3f7e
influxd[27351]: github.com/influxdata/influxdb/vendor/github.com/influxdata/roaring.(*intIterator).Next(0xc9eb58ac60, 0x414a01)
influxd[27351]: #011/go/src/github.com/influxdata/influxdb/vendor/github.com/influxdata/roaring/roaring.go:239 +0x34 fp=0xc029af9ad8 sp=0xc029af9ab0 pc=0x9b0ac4
influxd[27351]: github.com/influxdata/influxdb/tsdb.(*seriesIDSetIterator).Next(0xc359829da0, 0xc99bca3701, 0x2c000000000000ff, 0x0, 0x2c1af81b11cf2772, 0xc289867bc0)
Anmol-Porwal18 commented 6 months ago

@hiltontj @bnpfeife do you have any suggestions for fixing the above error ?

hiltontj commented 6 months ago

Hello @Anmol-Porwal18 - would it be possible for you to upgrade to the latest version, i.e., v1.8.10, and see if that resolves the issue?

Otherwise, we may need more information to help troubleshoot the issue.

Anmol-Porwal18 commented 6 months ago

@hiltontj it'll be a huge task for us to upgrade to v1.8.10. Could it be because of huge amount of data ? because we are seeing the error more frequently in a region where data is more and less frequently where data is less.

What details would you require from my side for troubleshooting the issue?

davidby-influx commented 6 months ago

This is probably from file corruption, but this version is very old. If it is a bug, it may have been fixed in a later version, but if it is a bug that has not been fixed, it would only be fixed in a later version, not in this version. We are not releasing updates to 1.7.X.

So, let us hope that it is environmental, if you do not wish to upgrade.

The first thing I would do is shut down the database, delete all the series files, and and restart the database, which will rebuild the series files. Take Step 1, but do not proceed to the later steps

A full stack trace of the goroutine that crashed might also help to debug this. The log above is truncated.

davidby-influx commented 6 months ago

If you have a huge amount of data, you should consider our on-premise, high-availability, clustered product, InfluxDB Enterprise. It comes with paid support, many, many improvements (it is currently at 1.11.5, with 4.5 years of bug fixes and optimizations since 1.7.9), and it can handle much larger loads with horizontal scaling. Let me know if you'd like to explore that option.

Satish2007 commented 6 months ago

I am observing issue for my InfluxDB 1.8.10.

Issue: InfluxDB Restarting Automatically Randomly.

Error in Logfile: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: unexpected fault address 0x7b35af7b5000 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: fatal error: fault Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x7b35af7b5000 pc=0x12a1fd0] Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: goroutine 11953909117 [running]: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: runtime.throw(0x169e54a, 0x5) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: https://github.com/influxdata/influxdb/issues/11/usr/local/go/src/runtime/panic.go:774 +0x72 fp=0xc2e2d8cc18 sp=0xc2e2d8cbe8 pc=0x431272 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: runtime.sigpanic() … … … Mar 28 06:52:09 influx influxd-systemd-start.sh[14374]: https://github.com/influxdata/influxdb/issues/11/usr/local/go/src/net/http/server.go:2928 +0x384 Mar 28 06:52:10 influx systemd[1]: influxdbsud.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Mar 28 06:52:10 influx systemd[1]: influxdbsud.service: Failed with result ‘exit-code’. Mar 28 06:52:10 influx systemd[1]: influxdbsud.service: Consumed 1month 6d 13h 29min 42.403s CPU time. Mar 28 06:52:10 influx systemd[1]: influxdbsud.service: Scheduled restart job, restart counter is at 1. Mar 28 06:52:10 influx systemd[1]: Stopped InfluxDB is an open-source, distributed, time series database. Mar 28 06:52:10 influx systemd[1]: influxdbsud.service: Consumed 1month 6d 13h 29min 42.403s CPU time. Mar 28 06:52:10 influx systemd[1]: Starting InfluxDB is an open-source, distributed, time series database… Mar 28 06:52:10 influx influxd-systemd-start.sh[116443]: Mar 28 06:52:10 influx influxd-systemd-start.sh[116440]: InfluxDB started Mar 28 06:52:10 influx systemd[1]: Started InfluxDB is an open-source, distributed, time series database. Mar 28 06:52:10 influx influxd-systemd-start.sh[116441]: ts=2024-03-28T06:52:10.842072Z lvl=info msg=“InfluxDB starting” log_id=0oCy7trW000 version=1.8.10 branch=1.8 commit=688e697c51fd

System Environment: Operating System: Ubuntu 22.04 on Azure InfluxDB Version: 1.8.10 CPU: 32vCPU RAM: 128GB Database Size: 80GB Disk Type: SSD cache-max-memory: 32GB Index Type: TSM

Any help/suggestions are appreciated in advance.

davidby-influx commented 6 months ago

@Satish2007 - A longer stacktrace from the panic would be the only way to know what is happening. Everything printed from the first Go routine.

Satish2007 commented 6 months ago

Thank you @davidby-influx , stack trace is huge, not sure if 100MB is allowed to attach on github.

Or otherwise I will share over Google drive and share the link.

Thank you once again.

davidby-influx commented 6 months ago

Just need the first go routine, not the whole thing

On Sun, Mar 31, 2024, 19:05 Satish2007 @.***> wrote:

Thank you @davidby-influx https://github.com/davidby-influx , stack trace is huge, not sure if 100MB is allowed to attach on github.

Or otherwise I will share over Google drive and share the link.

Thank you once again.

— Reply to this email directly, view it on GitHub https://github.com/influxdata/influxdb/issues/24707#issuecomment-2029031913, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARIQHJAMK7T5A5E3LL2HKKLY3C6G3AVCNFSM6AAAAABD7WK3OOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRZGAZTCOJRGM . You are receiving this because you were mentioned.Message ID: @.***>

Satish2007 commented 6 months ago

Hi @davidby-influx , thanks.

Please find the requested log trace. Let me know if you need any other details:

Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: ts=2024-03-28T06:52:02.133322Z lvl=info msg="Compacting file" log_id=0o1ULPl0000 engine=tsm1 tsm1_level=1 tsm1_strategy=level trace_id=0oCy7MqG000 op_name=tsm1_compact_group db_shard_id=43986 tsm1_index=7 tsm1_file=/data/influxdb-data-sud/data/_internal/monitor/43986/000001688-000000001.tsm Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: [httpd] 10.4.17.112 - root [28/Mar/2024:06:52:02 +0000] "POST /query?db=Datastore2329&epoch=n&p=%5BREDACTED%5D&u=root HTTP/1.1 {'q': ' select "30102100",quality from "30100000" where time >= '2024-01-20T19:00:01.000Z' and time <= '2024-01-22T19:00:01.000Z' ;'}" 200 11513 "-" "-" ae46c6cc-eccf-11ee-88d6-0022485c7852 1178 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: unexpected fault address 0x7b35af7b5000 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: fatal error: fault Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x7b35af7b5000 pc=0x12a1fd0] Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: goroutine 11953909117 [running]: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: runtime.throw(0x169e54a, 0x5) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/runtime/panic.go:774 +0x72 fp=0xc2e2d8cc18 sp=0xc2e2d8cbe8 pc=0x431272 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: runtime.sigpanic() Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/runtime/signal_unix.go:401 +0x3de fp=0xc2e2d8cc48 sp=0xc2e2d8cc18 pc=0x4468ae Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: encoding/binary.bigEndian.Uint32(...) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/encoding/binary/binary.go:112 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(indirectIndex).Key(0xc26731d5f0, 0x0, 0xc38d465ab0, 0x0, 0x0, 0x0, 0x12c1e00, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:913 +0x150 fp=0xc2e2d8cd08 sp=0xc2e2d8cc48 pc=0x12a1fd0 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(TSMReader).Key(...) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:315 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(BlockIterator).Next(0xc38d465a80, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:176 +0xd9 fp=0xc2e2d8cd70 sp=0xc2e2d8cd08 pc=0x129e249 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(tsmBatchKeyIterator).Next(0xc1e1b0a3c0, 0xc3f30e5140) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1730 +0x616 fp=0xc2e2d8cfa8 sp=0xc2e2d8cd70 pc=0x124e646 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(Compactor).write(0xc344552fc0, 0xc54b418190, 0x50, 0x25aa640, 0xc1e1b0a3c0, 0xc485a61201, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1141 +0x1dd fp=0xc2e2d8d100 sp=0xc2e2d8cfa8 pc=0x124cbdd Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(Compactor).writeNewFiles(0xc344552fc0, 0x698, 0x2, 0xc38d465280, 0x8, 0x8, 0x25aa640, 0xc1e1b0a3c0, 0x1, 0x25aa640, ...) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1045 +0x18f fp=0xc2e2d8d1b8 sp=0xc2e2d8d100 pc=0x124c59f Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(Compactor).compact(0xc344552fc0, 0xc38d465200, 0xc38d465280, 0x8, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:953 +0x40d fp=0xc2e2d8d2b8 sp=0xc2e2d8d1b8 pc=0x124b62d Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(Compactor).CompactFull(0xc344552fc0, 0xc38d465280, 0x8, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:971 +0x17e fp=0xc2e2d8d3b0 sp=0xc2e2d8d2b8 pc=0x124baee Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(compactionStrategy).compactGroup(0xc418a50460) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2289 +0x1c72 fp=0xc2e2d8de90 sp=0xc2e2d8d3b0 pc=0x1275052 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(compactionStrategy).Apply(0xc418a50460) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2266 +0x4d fp=0xc2e2d8ded8 sp=0xc2e2d8de90 pc=0x127338d Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(Engine).compactHiPriorityLevel.func1(0xc2b08aa310, 0xc32e3d2b40, 0x1, 0xc418a50460) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2183 +0x12e fp=0xc2e2d8dfc0 sp=0xc2e2d8ded8 pc=0x12c1e3e Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: runtime.goexit() Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc2e2d8dfc8 sp=0xc2e2d8dfc0 pc=0x460f61 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(Engine).compactHiPriorityLevel Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2178 +0x123 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: goroutine 1 [chan receive, 12840 minutes]: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: main.(Main).Run(0xc000589f20, 0xc00003c190, 0x2, 0x2, 0xc000589f20, 0x43f0ca) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:90 +0x2c9 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: main.main() Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:45 +0x13d Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: goroutine 6 [syscall, 12841 minutes]: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: os/signal.signal_recv(0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/runtime/sigqueue.go:147 +0x9c Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: os/signal.loop() Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/os/signal/signal_unix.go:23 +0x22 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: created by os/signal.init.0 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/os/signal/signal_unix.go:29 +0x41 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: goroutine 36 [select]: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/vendor/go.opencensus.io/stats/view.(worker).start(0xc00022caf0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/vendor/go.opencensus.io/stats/view/worker.go:154 +0x100 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: created by github.com/influxdata/influxdb/vendor/go.opencensus.io/stats/view.init.0 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/vendor/go.opencensus.io/stats/view/worker.go:32 +0x57 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: goroutine 69 [IO wait, 12841 minutes]: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.runtime_pollWait(0x7b42e4741f90, 0x72, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/runtime/netpoll.go:184 +0x55 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(pollDesc).wait(0xc0015db798, 0x72, 0x0, 0x0, 0x16a0d8c) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(pollDesc).waitRead(...) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_poll_runtime.go:92 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(FD).Accept(0xc0015db780, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_unix.go:384 +0x1f8 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(netFD).accept(0xc0015db780, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/fd_unix.go:238 +0x42 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(TCPListener).accept(0xc001752de0, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/tcpsock_posix.go:139 +0x32 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(TCPListener).Accept(0xc001752de0, 0x0, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/tcpsock.go:261 +0x47 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: github.com/influxdata/influxdb/tcp.(Mux).Serve(0xc0018ad1a0, 0x259d300, 0xc001752de0, 0xc001752de0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/tcp/mux.go:75 +0x92 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: created by github.com/influxdata/influxdb/cmd/influxd/run.(Server).Open Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/go/src/github.com/influxdata/influxdb/cmd/influxd/run/server.go:395 +0x280 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: goroutine 11942012981 [IO wait]: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.runtime_pollWait(0x7b3c69eb2908, 0x72, 0xffffffffffffffff) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/runtime/netpoll.go:184 +0x55 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(pollDesc).wait(0xc43af7d498, 0x72, 0x1000, 0x1000, 0xffffffffffffffff) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(pollDesc).waitRead(...) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_poll_runtime.go:92 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(FD).Read(0xc43af7d480, 0xc45d1bd000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_unix.go:169 +0x1cf Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(netFD).Read(0xc43af7d480, 0xc45d1bd000, 0x1000, 0x1000, 0xc4b6ba19e8, 0x4d3c3d, 0xc43af7d480) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/fd_unix.go:202 +0x4f Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(conn).Read(0xc174374170, 0xc45d1bd000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/net.go:184 +0x68 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net/http.(connReader).Read(0xc3d5e41680, 0xc45d1bd000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:785 +0xf4 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: bufio.(Reader).fill(0xc45a7a2780) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/bufio/bufio.go:100 +0x103 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: bufio.(Reader).Peek(0xc45a7a2780, 0x4, 0x0, 0x0, 0x0, 0x0, 0xc4b6ba1ad0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/bufio/bufio.go:138 +0x4f Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net/http.(conn).readRequest(0xc3b3fe2640, 0x25a5380, 0xc3d5edcac0, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:962 +0xb3b Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net/http.(conn).serve(0xc3b3fe2640, 0x25a5380, 0xc3d5edcac0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:1817 +0x6d4 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: created by net/http.(Server).Serve Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:2928 +0x384 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: goroutine 11953311239 [IO wait]: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.runtime_pollWait(0x7b3c5df65c10, 0x72, 0xffffffffffffffff) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/runtime/netpoll.go:184 +0x55 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(pollDesc).wait(0xc38d1cfd98, 0x72, 0x1000, 0x1000, 0xffffffffffffffff) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(pollDesc).waitRead(...) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_poll_runtime.go:92 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(FD).Read(0xc38d1cfd80, 0xc537e58000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_unix.go:169 +0x1cf Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(netFD).Read(0xc38d1cfd80, 0xc537e58000, 0x1000, 0x1000, 0xc589e4d9e8, 0x4d3c3d, 0xc38d1cfd80) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/fd_unix.go:202 +0x4f Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(conn).Read(0xc307bd9928, 0xc537e58000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/net.go:184 +0x68 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net/http.(connReader).Read(0xc2b9a717a0, 0xc537e58000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:785 +0xf4 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: bufio.(Reader).fill(0xc0f5bb6000) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/bufio/bufio.go:100 +0x103 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: bufio.(Reader).Peek(0xc0f5bb6000, 0x4, 0x0, 0x0, 0x0, 0x0, 0xc589e4dad0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/bufio/bufio.go:138 +0x4f Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net/http.(conn).readRequest(0xc538ecb7c0, 0x25a5380, 0xc2305be3c0, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:962 +0xb3b Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net/http.(conn).serve(0xc538ecb7c0, 0x25a5380, 0xc2305be3c0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:1817 +0x6d4 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: created by net/http.(Server).Serve Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:2928 +0x384 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: goroutine 11950825298 [IO wait]: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.runtime_pollWait(0x7b3ca959fb78, 0x72, 0xffffffffffffffff) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/runtime/netpoll.go:184 +0x55 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(pollDesc).wait(0xc4a1b8ca98, 0x72, 0x1000, 0x1000, 0xffffffffffffffff) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(pollDesc).waitRead(...) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_poll_runtime.go:92 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(FD).Read(0xc4a1b8ca80, 0xc50682d000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_unix.go:169 +0x1cf Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(netFD).Read(0xc4a1b8ca80, 0xc50682d000, 0x1000, 0x1000, 0xc05959f9e8, 0x4d3c3d, 0xc4a1b8ca80) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/fd_unix.go:202 +0x4f Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(conn).Read(0xc169af4f98, 0xc50682d000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/net.go:184 +0x68 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net/http.(connReader).Read(0xc277ae5620, 0xc50682d000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:785 +0xf4 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: bufio.(Reader).fill(0xc4fe4f7e60) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/bufio/bufio.go:100 +0x103 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: bufio.(Reader).Peek(0xc4fe4f7e60, 0x4, 0x0, 0x0, 0x0, 0x0, 0xc05959fad0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/bufio/bufio.go:138 +0x4f Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net/http.(conn).readRequest(0xc3971ab2c0, 0x25a5380, 0xc3791dbec0, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:962 +0xb3b Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net/http.(conn).serve(0xc3971ab2c0, 0x25a5380, 0xc3791dbec0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:1817 +0x6d4 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: created by net/http.(Server).Serve Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:2928 +0x384 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: goroutine 11951134932 [IO wait]: Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.runtime_pollWait(0x7b3c54e40c18, 0x72, 0xffffffffffffffff) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/runtime/netpoll.go:184 +0x55 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(pollDesc).wait(0xc1b5366c98, 0x72, 0x1000, 0x1000, 0xffffffffffffffff) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(pollDesc).waitRead(...) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_poll_runtime.go:92 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: internal/poll.(FD).Read(0xc1b5366c80, 0xc437551000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/internal/poll/fd_unix.go:169 +0x1cf Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(netFD).Read(0xc1b5366c80, 0xc437551000, 0x1000, 0x1000, 0xc4a97679e8, 0x4d3c3d, 0xc1b5366c80) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/fd_unix.go:202 +0x4f Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net.(conn).Read(0xc473209700, 0xc437551000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/net.go:184 +0x68 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: net/http.(connReader).Read(0xc285be8090, 0xc437551000, 0x1000, 0x1000, 0x0, 0x0, 0x0) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/net/http/server.go:785 +0xf4 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: bufio.(Reader).fill(0xc434a77320) Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: #011/usr/local/go/src/bufio/bufio.go:100 +0x103 Mar 28 06:52:02 influx influxd-systemd-start.sh[14374]: bufio.(*Reader).Peek(0xc434a77320, 0x4, 0x0, 0x0, 0x0, 0x0, 0xc4a9767ad0)

davidby-influx commented 6 months ago

@Satish2007 - This is fixed in 2.7.5, by one of these PRs. https://github.com/influxdata/influxdb/pull/24521 https://github.com/influxdata/influxdb/pull/24599

Satish2007 commented 6 months ago

Thank you very much @davidby-influx for the update. Upgrade to 2.x will require application change and we are planning to upgrade to v3 whenever it is available.

Is there any temporary workaround as of now to avoid IndexDB crash? Because upgrade will take sometime, as of now I just want to apply temporary workaround.

Regards Satish

davidby-influx commented 6 months ago

@satish2007 - Sometimes deleting the _series directories and restarting InfluxDB can help; they are automatically regenerated if missing on startup.

The fix is also present in the 1.11.5 tag, but you will have to build that from source yourself; we do not have binaries available for it.

Satish2007 commented 6 months ago

Hi @davidby-influx , thank you for the update.

Since I am using index type as a TSM, I am thinking that _series data will automatically generated during every restart, deletion of _series also applicable to TSM? Not just TSI? My understanding may be wrong though.

If yes , then I would like to try first option first, then second one.

Regards Satish

Satish2007 commented 5 months ago

Hi @davidby-influx,

Awaiting for your expert advise. If you can comment on above query.

Thank you.

davidby-influx commented 5 months ago

@Satish2007 - TSM is not an index type, it is the format of the data files.

The two index types are In-Mem and TSI.

Which index type you use is controlled by this setting. There are steps you have to take to convert an instance between indices.

Removing the _series directories will cause the files there to regenerate on restart regardless of which index type you are using.

Satish2007 commented 5 months ago

Thank you @davidby-influx. As of now not getting error. Don't know how it got resolved automatically. Will keep you updated on this.