Open zehuaiWANG opened 4 years ago
@zehuaiWANG thanks for the issue. Could you upgrade to the latest 1.7.10 and see if the issue is still there?
@russorat Thank you for helping me. Form my prometheus, it have some Warnning: msg="Error sending samples to remote storage" count=100 err="server returned HTTP status 500 Internal Server Error: {\"error\":\"engine: error syncing wal\"} and I look at the influxdb, it also have some warning:
and i use a ssd and it doesn't seem to have a high I/O usgae: $ iostat -xzt 1 03/03/20 15:24:19 avg-cpu: %user %nice %system %iowait %steal %idle 22.83 0.00 4.46 6.31 0.00 66.40
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 12939.36 4435.38 169009.21 129051.84 17.15 3.41 0.19 0.17 0.26 0.02 26.49 sda 0.25 10.65 19.66 3.87 560.52 115.64 28.74 0.00 0.13 0.11 0.19 0.07 0.16
03/03/20 15:24:20 avg-cpu: %user %nice %system %iowait %steal %idle 10.25 0.00 1.45 0.12 0.00 88.17
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 0.00 2638.00 0.00 108352.00 41.07 1.78 0.68 0.00 0.68 0.02 5.60 sda 0.00 6.00 0.00 4.00 0.00 80.00 20.00 0.00 0.00 0.00 0.00 0.00 0.00
03/03/20 15:24:21 avg-cpu: %user %nice %system %iowait %steal %idle 12.42 0.00 0.79 0.38 0.00 86.41
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 0.00 3005.00 0.00 192544.00 64.07 4.77 1.59 0.00 1.59 0.03 8.00 sda 0.00 4.00 0.00 2.00 0.00 48.00 24.00 0.00 0.00 0.00 0.00 0.00 0.00
I don't know why this happens, and i wonder if you could help me? Thanks a lot.
@zehuaiWANG thanks for the info. Digging though other issues that mention error syncing wal
, it sounds like the first place to check is to see if your disks are saturated around the same time you are getting this error: https://github.com/influxdata/influxdb/issues/9544. From the info above, the timestamps don't line up so that might be something to check.
The other thing to check is your wal-fsync-delay
value in your influx config: https://github.com/influxdata/influxdb/issues/8758
hi~ @russorat , thank you for your reply., I have seen the two issues above. I checked the I / O situation, and the usage rate was only about 15%. At the same time, I was using SSD. I read the introduction of wal-fsync-delay
value. He only explained that when using non-SSD, how much should I set to be appropriate if I use SSD?
# The amount of time that a write will wait before fsyncing. A duration
# greater than 0 can be used to batch up multiple fsync calls. This is useful for slower
# disks or when WAL write contention is seen. A value of 0s fsyncs every write to the WAL.
# Values in the range of 0-100ms are recommended for non-SSD disks.
# wal-fsync-delay = "0s"
I have similar situation with 1.7.10 alpine - docker image
Same here with 1.7.10 docker.
Hi~ I have some problem and wonder if anyone here could help me. I use the influxdb 1.7.4 .
I had used tsi1 $ cat influxd.log | grep -oP 'index_version=(inmem|tsi1)' | sort | uniq -c 20208 index_version=tsi1
it doesn't seem to have a high mem top - 20:25:14 up 9 days, 12:10, 2 users, load average: 3.95, 3.87, 4.12 Tasks: 320 total, 1 running, 319 sleeping, 0 stopped, 0 zombie Cpu(s): 27.5%us, 4.1%sy, 0.0%ni, 67.8%id, 0.3%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 131387368k total, 129151756k used, 2235612k free, 244604k buffers Swap: 0k total, 0k used, 0k free, 47405776k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24719 influxdb 20 0 763g 98g 22g S 763.0 78.6 621:26.17 influxd
but the influxdb OOM again and again
I found it have panic in the log as following:
fatal error: runtime: out of memory
runtime stack: runtime.throw(0x130f93b, 0x16) /usr/local/go/src/runtime/panic.go:608 +0x72 runtime.sysMap(0xdd00000000, 0x8000000, 0x231ef38) /usr/local/go/src/runtime/mem_linux.go:156 +0xc7 runtime.(mheap).sysAlloc(0x2305180, 0x8000000, 0x0, 0x7f9f27ffecc0) /usr/local/go/src/runtime/malloc.go:619 +0x1c7 runtime.(mheap).grow(0x2305180, 0x2020, 0x0) /usr/local/go/src/runtime/mheap.go:920 +0x42 runtime.(mheap).allocSpanLocked(0x2305180, 0x2020, 0x231ef48, 0x20373f00000000) /usr/local/go/src/runtime/mheap.go:848 +0x337 runtime.(mheap).alloc_m(0x2305180, 0x2020, 0x410101, 0x7f52a483ef00) /usr/local/go/src/runtime/mheap.go:692 +0x119 runtime.(mheap).alloc.func1() /usr/local/go/src/runtime/mheap.go:759 +0x4c runtime.(mheap).alloc(0x2305180, 0x2020, 0x7f6ff7010101, 0x128) /usr/local/go/src/runtime/mheap.go:758 +0x8a runtime.largeAlloc(0x403f000, 0x7f9f27ff0101, 0x459c4a) /usr/local/go/src/runtime/malloc.go:1019 +0x97 runtime.mallocgc.func1() /usr/local/go/src/runtime/malloc.go:914 +0x46 runtime.systemstack(0x0) /usr/local/go/src/runtime/asm_amd64.s:351 +0x66 runtime.mstart() /usr/local/go/src/runtime/proc.go:1229
goroutine 100938261 [running]: runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:311 fp=0xdc35f95138 sp=0xdc35f95130 pc=0x45b890 runtime.mallocgc(0x403f000, 0x10d6680, 0xdcfd9fc701, 0xdc35f95210) /usr/local/go/src/runtime/malloc.go:913 +0x896 fp=0xdc35f951d8 sp=0xdc35f95138 pc=0x40def6 runtime.makeslice(0x10d6680, 0x403f000, 0x403f000, 0xdc35f95330, 0xf769e4, 0xdccae6e6c0) /usr/local/go/src/runtime/slice.go:70 +0x77 fp=0xdc35f95208 sp=0xdc35f951d8 pc=0x444907 bytes.makeSlice(0x403f000, 0x0, 0x0, 0x0) /usr/local/go/src/bytes/buffer.go:231 +0x6d fp=0xdc35f95248 sp=0xdc35f95208 pc=0x4fa3ad bytes.(Buffer).grow(0xd246c29960, 0x1000, 0xdc35f95348) /usr/local/go/src/bytes/buffer.go:144 +0x15a fp=0xdc35f95298 sp=0xdc35f95248 pc=0x4f9d1a bytes.(Buffer).Write(0xd246c29960, 0xdccdf06000, 0x1000, 0x1000, 0x0, 0x5, 0xc6087d7370) /usr/local/go/src/bytes/buffer.go:174 +0xdc fp=0xdc35f952c8 sp=0xdc35f95298 pc=0x4f9ffc bufio.(Writer).Flush(0xdccb3b0e40, 0x7ed6b65f5a76, 0x43) /usr/local/go/src/bufio/bufio.go:575 +0x75 fp=0xdc35f95328 sp=0xdc35f952c8 pc=0x5211c5 bufio.(Writer).Write(0xdccb3b0e40, 0x7ed6b65f5a76, 0x13a, 0x1461fe3, 0x2, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:611 +0xeb fp=0xdc35f95388 sp=0xdc35f95328 pc=0x52143b github.com/influxdata/influxdb/tsdb/engine/tsm1.(directIndex).flush(0xdccdd64360, 0x164e840, 0xdccb3b0e40, 0x7ec05342d9a0, 0x13a, 0x12d345f) /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/writer.go:466 +0x1f0 fp=0xdc35f95498 sp=0xdc35f95388 pc=0xfea040 github.com/influxdata/influxdb/tsdb/engine/tsm1.(directIndex).Add(0xdccdd64360, 0x7ec05342d9a0, 0x13a, 0x12d345f, 0x0, 0x15f857ed9c20f2c0, 0x15f857f498449ec0, 0x3c8fe4, 0xdc00000024) /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/writer.go:314 +0x337 fp=0xdc35f95548 sp=0xdc35f95498 pc=0xfe9267 github.com/influxdata/influxdb/tsdb/engine/tsm1.(tsmWriter).WriteBlock(0xdccb3b0ec0, 0x7ec05342d9a0, 0x13a, 0x12d345f, 0x15f857ed9c20f2c0, 0x15f857f498449ec0, 0xdcfd9b5d10, 0x20, 0x29, 0x0, ...) /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/writer.go:686 +0x1eb fp=0xdc35f955b8 sp=0xdc35f95548 pc=0xfeb9db github.com/influxdata/influxdb/tsdb/engine/tsm1.(Compactor).write(0xcadeea85a0, 0xd16cb20140, 0x47, 0x166a0e0, 0xdccae6e6c0, 0xdccce18e01, 0x0, 0x0) /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1149 +0x283 fp=0xdc35f956c8 sp=0xdc35f955b8 pc=0xf83473 github.com/influxdata/influxdb/tsdb/engine/tsm1.(Compactor).writeNewFiles(0xcadeea85a0, 0x3e1, 0x2, 0xdccab0e680, 0x8, 0x8, 0x166a0e0, 0xdccae6e6c0, 0x1, 0x0, ...) /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1032 +0x1a5 fp=0xdc35f95780 sp=0xdc35f956c8 pc=0xf82d85 github.com/influxdata/influxdb/tsdb/engine/tsm1.(Compactor).compact(0xcadeea85a0, 0x1522300, 0xdccab0e680, 0x8, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0) /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:940 +0x407 fp=0xdc35f958a8 sp=0xdc35f95780 pc=0xf81fd7 github.com/influxdata/influxdb/tsdb/engine/tsm1.(Compactor).CompactFull(0xcadeea85a0, 0xdccab0e680, 0x8, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0) /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:958 +0x180 fp=0xdc35f95958 sp=0xdc35f958a8 pc=0xf82460 github.com/influxdata/influxdb/tsdb/engine/tsm1.(compactionStrategy).compactGroup(0xd246c298f0) /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2152 +0x10ba fp=0xdc35f95f40 sp=0xdc35f95958 pc=0xfa8eca github.com/influxdata/influxdb/tsdb/engine/tsm1.(compactionStrategy).Apply(0xd246c298f0) /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2129 +0x4d fp=0xdc35f95f88 sp=0xdc35f95f40 pc=0xfa7dbd github.com/influxdata/influxdb/tsdb/engine/tsm1.(Engine).compactHiPriorityLevel.func1(0xceddda8490, 0xd7aaf730e0, 0x1, 0xd246c298f0) /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2046 +0xe0 fp=0xdc35f95fc0 sp=0xdc35f95f88 pc=0xff4000 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xdc35f95fc8 sp=0xdc35f95fc0 pc=0x45d971 created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactHiPriorityLevel /go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2041 +0x123
goroutine 1 [chan receive, 36 minutes]: main.(*Main).Run(0xc00039bf58, 0xc00003a060, 0x4, 0x4, 0xc00039bf68, 0x1009516) /go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:90 +0x2d1 main.main() /go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:45 +0x12f
goroutine 5 [syscall, 41 minutes]: os/signal.signal_recv(0x0) /usr/local/go/src/runtime/sigqueue.go:139 +0x9c os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x22 created by os/signal.init.0 /usr/local/go/src/os/signal/signal_unix.go:29 +0x41