Closed mattheworiordan closed 7 years ago
Looks like this might still exist in 1.2. The equivalent line from 1.1 is https://github.com/influxdata/influxdb/blob/master/tsdb/store.go#L740. m.seriesByID
map being accessed outside of lock.
@mattheworiordan For your high cardinality issue, you can run influx_inspect report -detailed /data/path/to/shard
to get some cardinality estimates on the series, tags and values in that shard.
That might help you track down what series or tags are problematic.
I encountered this issue in Influx 1.2 as well. For what it's worth as a testing point, it seems to only occur if I try to view metrics using Grafana while they are being written.
Compiled using Go 1.7.3. @jwilder I have a pretty easy testable and reproducible scenario, is there any information I might provide you to help you track this down? (The caveat it is on a private server and cannot provide direct access, but I will gather any logs or influx info you may need.)
edit To update this item, I have ruled out Grafana. Instead, it looks to be a certain query from a piece of software we have written using the influxdb-java library. I will investigate and try to isolate the specific query. Grafana is interacting with it quite well as long as our code is not running at the same time.
Hi, we are seeing this exact issue as well.
[httpd] 127.0.0.1 - lz5 [10/Jul/2017:19:19:03 +0000] "GET /query?db=lz5aggregates&q=DELETE+FROM+%22aggregates%22+WHERE+time+%3C+1498996800000000000 HTTP/1.1" 200 62 "-" "python-requests/2.12.1" a2d7be3f-65a4-11
e7-8477-000000000000 8463166
[I] 2017-07-10T19:19:12Z DELETE FROM aggregates WHERE time < 1499000400000000000 service=query
fatal error: concurrent map read and map write
goroutine 3052047 [running]:
runtime.throw(0xace810, 0x21)
/usr/local/go/src/runtime/panic.go:566 +0x95 fp=0xc42004b970 sp=0xc42004b950
runtime.mapaccess1_fast64(0xa06a20, 0xc4211081e0, 0x67b4, 0xc435fd8b18)
/usr/local/go/src/runtime/hashmap_fast.go:111 +0x1a7 fp=0xc42004b998 sp=0xc42004b970
github.com/influxdata/influxdb/tsdb.(*Store).DeleteSeries(0xc42000bc00, 0xc434b05010, 0xd, 0xc42936a860, 0x1, 0x1, 0xe37ba0, 0xc434c78c60, 0x0, 0x0)
/root/go/src/github.com/influxdata/influxdb/tsdb/store.go:740 +0x297 fp=0xc42004baa0 sp=0xc42004b998
github.com/influxdata/influxdb/coordinator.LocalTSDBStore.DeleteSeries(0xc42000bc00, 0xc434b05010, 0xd, 0xc42936a860, 0x1, 0x1, 0xe37ba0, 0xc434c78c60, 0x0, 0x0)
<autogenerated>:107 +0x97 fp=0xc42004bb00 sp=0xc42004baa0
github.com/influxdata/influxdb/coordinator.(*StatementExecutor).executeDeleteSeriesStatement(0xc4201d82a0, 0xc42a3c0bd0, 0xc434b05010, 0xd, 0xe59940, 0x0)
/root/go/src/github.com/influxdata/influxdb/coordinator/statement_executor.go:317 +0x17b fp=0xc42004bb80 sp=0xc42004bb00
github.com/influxdata/influxdb/coordinator.(*StatementExecutor).ExecuteStatement(0xc4201d82a0, 0xe3b0a0, 0xc42a3c0bd0, 0x0, 0x2205c, 0xc425ac4cc0, 0xc436ff8ea0, 0xe3eb40, 0xc4201ec900, 0xc425ac4ba0, ...)
/root/go/src/github.com/influxdata/influxdb/coordinator/statement_executor.go:95 +0x1960 fp=0xc42004bc70 sp=0xc42004bb80
github.com/influxdata/influxdb/influxql.(*QueryExecutor).executeQuery(0xc4201c1200, 0xc457acba80, 0xc434b05010, 0xd, 0x2710, 0x1, 0x0, 0x0, 0xc436ff8de0, 0xc436ff8d80, ...)
/root/go/src/github.com/influxdata/influxdb/influxql/query_executor.go:312 +0x55d fp=0xc42004bf58 sp=0xc42004bc70
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc42004bf60 sp=0xc42004bf58
created by github.com/influxdata/influxdb/influxql.(*QueryExecutor).ExecuteQuery
/root/go/src/github.com/influxdata/influxdb/influxql/query_executor.go:207 +0xb5
goroutine 1 [chan receive, 4158 minutes]:
main.(*Main).Run(0xc420052bc0, 0xc42000c0b0, 0x4, 0x4, 0x0, 0xed0f1fb6f)
/root/go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:98 +0x399
main.main()
/root/go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:46 +0xb2
goroutine 17 [syscall, 4158 minutes, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1
goroutine 5 [syscall, 4158 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:116 +0x157
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
/usr/local/go/src/os/signal/signal_unix.go:28 +0x41
goroutine 7 [IO wait, 4158 minutes]:
net.runtime_pollWait(0x7f08bf4209e8, 0x72, 0x0)
/usr/local/go/src/runtime/netpoll.go:160 +0x59
net.(*pollDesc).wait(0xc4201d8370, 0x72, 0xc420024de8, 0xc420010078)
/usr/local/go/src/net/fd_poll_runtime.go:73 +0x38
net.(*pollDesc).waitRead(0xc4201d8370, 0xe33720, 0xc420010078)
/usr/local/go/src/net/fd_poll_runtime.go:78 +0x34
This should be fixed in the current nightlies/1.3 via 4aa78383d11dc327dc48f70a29a81ae352d5b546
Bug report
System info: Influx 1.1.1 Ubuntu 14.04 Run in a Docker container v1.10.3
Steps to reproduce:
Run a
DELETE
query on a measurement with very high cardinality. As there is still very little tooling to fix high cardinality problems I am forced to try and delete data that is causing this problem.Note this appears to be a similar issue to https://github.com/influxdata/influxdb/issues/6720, however that issue was closed and marked as fixed.
Expected behavior:
No panic!
Actual behavior:
Panic.
Additional info:
More detailed log at https://gist.github.com/mattheworiordan/434b7ec61264a075ff178ef50ab91d91