influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.92k stars 3.55k forks source link

fatal error: concurrent map read and map write with opentsdb? #17308

Open Obihoernchen opened 4 years ago

Obihoernchen commented 4 years ago

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

  1. I don't know how to reproduce the issue consistently. It appears to be random.

Expected behavior: No crash

Actual behavior: InfluxDB crashes about once a week (but recovers automatically).

Environment info: There are two influxdb instances running on two identical ppc64le nodes. Collectd sends data from multiple hosts via the write_opentsdb plugin to both instances.

python build.py --package --release --clean --update

__Config:__
Non default config entries:

reporting-disabled = true

[data] index-version = "tsi1" query-log-enabled = false series-id-set-cache-size = 100

[http] auth-enabled = true log-enabled = false

[[opentsdb]] enabled = true bind-address = ":4242" database = "esmon_database"

[continuous_queries] log-enabled = false



__Logs:__
[influxdb_crash_node1.log](https://github.com/influxdata/influxdb/files/4343088/influxdb_crash_node1.log)
[influxdb_crash_node2.log](https://github.com/influxdata/influxdb/files/4343089/influxdb_crash_node2.log)
russorat commented 4 years ago

@Obihoernchen thanks for this. Pasting some of the log files here:

Mär 13 21:49:56 node1 influxd[21505]: ts=2020-03-13T20:49:56.443038Z lvl=info msg="Compacting file" log_id=0LLW8k2l000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0LXQQ6rW000 op_name=tsm1_compact_group tsm1_index=3 tsm1_file=/var/lib/influxdb/data/esmon_database/autogen/429/000019072-000000004.tsm
Mär 13 21:50:11 node1 influxd[21505]: ts=2020-03-13T20:50:11.466226Z lvl=info msg="Cache snapshot (start)" log_id=0LLW8k2l000 engine=tsm1 trace_id=0LXQR1Ul000 op_name=tsm1_cache_snapshot op_event=start
Mär 13 21:50:13 node1 influxd[21505]: ts=2020-03-13T20:50:13.555685Z lvl=info msg="Snapshot for path written" log_id=0LLW8k2l000 engine=tsm1 trace_id=0LXQR1Ul000 op_name=tsm1_cache_snapshot path=/var/lib/influxdb/data/esmon_database/autogen/429 duration=2103.821ms
Mär 13 21:50:13 node1 influxd[21505]: ts=2020-03-13T20:50:13.555747Z lvl=info msg="Cache snapshot (end)" log_id=0LLW8k2l000 engine=tsm1 trace_id=0LXQR1Ul000 op_name=tsm1_cache_snapshot op_event=end op_elapsed=2103.807ms
Mär 13 21:50:14 node1 influxd[21505]: fatal error: concurrent map read and map write
Mär 13 21:50:14 node1 influxd[21505]: goroutine 8000570258 [running]:
Mär 13 21:50:14 node1 influxd[21505]: runtime.throw(0x11078372, 0x21)
Mär 13 21:50:14 node1 influxd[21505]: /usr/local/go/src/runtime/panic.go:617 +0x5c fp=0xc0c111e8c8 sp=0xc0c111e888 pc=0x1003044c
Mär 13 21:50:14 node1 influxd[21505]: runtime.mapaccess1_faststr(0x10e26aa0, 0xc2f19e4300, 0xc2a3745280, 0x80, 0x1)
Mär 13 21:50:14 node1 influxd[21505]: /usr/local/go/src/runtime/map_faststr.go:21 +0x4e8 fp=0xc0c111e948 sp=0xc0c111e8c8 pc=0x10014128
Mär 13 21:50:14 node1 influxd[21505]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*partition).entry(0xc2025f7340, 0xc2a3745280, 0x80, 0x80, 0x0)
Mär 13 21:50:14 node1 influxd[21505]: /home/build/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/ring.go:229 +0x7c fp=0xc0c111e998 sp=0xc0c111e948 pc=0x10cbbf2c
Mär 13 21:50:14 node1 influxd[21505]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*ring).entry(0xc2025f7160, 0xc2a3745280, 0x80, 0x80, 0x0)
Mär 13 21:50:14 node1 influxd[21505]: /home/build/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/ring.go:93 +0x84 fp=0xc0c111e9e0 sp=0xc0c111e998 pc=0x10cbb3b4
Mär 13 21:50:14 node1 influxd[21505]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Values(0xc180513e40, 0xc2a3745280, 0x80, 0x80, 0x0, 0xc25e4dd320, 0x11ca5c00)
Mär 13 21:50:14 node1 influxd[21505]: /home/build/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:559 +0x88 fp=0xc0c111ead0 sp=0xc0c111e9e0 pc=0x10c5b3f8
Mär 13 21:50:14 node1 influxd[21505]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).buildFloatCursor(0xc0649f2280, 0x11cb25c0, 0xc23785c960, 0xc3837e4300, 0x13, 0xc0c3186800, 0x77, 0xc096941325, 0x5, 0x11ca8c80, ...)
Mär 13 21:50:14 node1 influxd[21505]: /home/build/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.gen.go:18 +0x164 fp=0xc0c111ed70 sp=0xc0c111ead0 pc=0x10c7c6f4
Mär 13 21:50:14 node1 influxd[21505]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).buildCursor(0xc0649f2280, 0x11cb25c0, 0xc23785c960, 0xc3837e4300, 0x13, 0xc0c3186800, 0x77, 0xc2516f4000, 0x6, 0x6, ...)
Mär 13 21:50:14 node1 influxd[21505]: /home/build/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2892 +0x968 fp=0xc0c111ef80 sp=0xc0c111ed70 pc=0x10c923e8
Mär 13 21:50:14 node1 influxd[21505]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createVarRefSeriesIterator(0xc0649f2280, 0x11cb25c0, 0xc23785c960, 0xc275ba1880, 0xc3837e4300, 0x13, 0xc0c3186800, 0x77, 0xc1374e6870, 0x0, ...)
Mär 13 21:50:14 node1 influxd[21505]: /home/build/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2665 +0x2400 fp=0xc0c111fa38 sp=0xc0c111ef80 pc=0x10c91940
Mär 13 21:50:14 node1 influxd[21505]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetGroupIterators(0xc0649f2280, 0x11cb25c0, 0xc23785c960, 0xc275ba1880, 0xc3837e4300, 0x13, 0xc0cfb66700, 0x1, 0x10, 0xc1374e6870, ...)
Mär 13 21:50:14 node1 influxd[21505]: /home/build/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2620 +0x17c fp=0xc0c111fc80 sp=0xc0c111fa38 pc=0x10c8f1fc
Mär 13 21:50:14 node1 influxd[21505]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetIterators.func1(0xc062e98a40, 0xc0649f2280, 0x11cb25c0, 0xc23785c960, 0xc275ba1880, 0xc3837e4300, 0x13, 0xc10949c000, 0x3c, 0x3c, ...)
Mär 13 21:50:14 node1 influxd[21505]: /home/build/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2580 +0x174 fp=0xc0c111ff58 sp=0xc0c111fc80 pc=0x10cd5b84
Mär 13 21:50:14 node1 influxd[21505]: runtime.goexit()
Mär 13 21:50:14 node1 influxd[21505]: /usr/local/go/src/runtime/asm_ppc64x.s:857 +0x4 fp=0xc0c111ff58 sp=0xc0c111ff58 pc=0x10061384
Mär 13 21:50:14 node1 influxd[21505]: created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetIterators
Mär 13 21:50:14 node1 influxd[21505]: /home/build/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2578 +0x34c