Open slopee opened 4 years ago
@slopee thanks for the issue. Could you add your docker compose file as well as your system information? Since we don't have other reports of a panic like this, it likely points to something in your environment or setup that is causing problems. That info will help us investigate.
Cleaned up the stack trace:
panic:runtime error: index out of range
goroutine 190 [running]:
runtime/debug.Stack(0xc00072f9c0, 0x1, 0x1)
/usr/local/go/src/runtime/debug/stack.go:24 +0x9d
github.com/influxdata/influxdb/query.(*Executor).recover(0xc0004b3ad0, 0xc0007c4c00, 0xc00008d4a0)
/go/src/github.com/influxdata/influxdb/query/executor.go:394 +0xc2
panic(0x1329c80, 0x2db4da0)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
encoding/binary.bigEndian.Uint16(...)
/usr/local/go/src/encoding/binary/binary.go:100
github.com/influxdata/influxdb/tsdb.ReadSeriesKeyMeasurement(...)
/go/src/github.com/influxdata/influxdb/tsdb/series_file.go:344
github.com/influxdata/influxdb/tsdb.CompareSeriesKeys(0x7fd1b28650f0, 0x1, 0x3fef10, 0x7fd1b1864d77, 0x1, 0x3ff289, 0x1)
/go/src/github.com/influxdata/influxdb/tsdb/series_file.go:416 +0x820
github.com/influxdata/influxdb/tsdb.seriesKeys.Less(...)
/go/src/github.com/influxdata/influxdb/tsdb/series_file.go:493
sort.medianOfThree(0x214fe40, 0xc000e418a0, 0xfc, 0xdd, 0xbe)
/usr/local/go/src/sort/sort.go:76 +0x49
sort.doPivot(0x214fe40, 0xc000e418a0, 0x0, 0xfd, 0x314a500, 0x7fd1bbb756d0)
/usr/local/go/src/sort/sort.go:103 +0x642
sort.quickSort(0x214fe40, 0xc000e418a0, 0x0, 0xfd, 0x10)
/usr/local/go/src/sort/sort.go:190 +0x9a
sort.Sort(0x214fe40, 0xc000e418a0)
/usr/local/go/src/sort/sort.go:218 +0x79
github.com/influxdata/influxdb/tsdb.(*seriesPointIterator).readSeriesKeys(0xc00000d680, 0xc000513d00, 0x4, 0x8, 0x0, 0x0)
/go/src/github.com/influxdata/influxdb/tsdb/index.go:904 +0x397
github.com/influxdata/influxdb/tsdb.(*seriesPointIterator).Next(0xc00000d680, 0xc000048570, 0xc000048500, 0xc001165e00)
/go/src/github.com/influxdata/influxdb/tsdb/index.go:836 +0x43a
github.com/influxdata/influxdb/query.(*floatInterruptIterator).Next(0xc000e41780, 0x42eaa1, 0x1f72150, 0xc000a674d0)
/go/src/github.com/influxdata/influxdb/query/iterator.gen.go:941 +0x48
github.com/influxdata/influxdb/query.(*floatFastDedupeIterator).Next(0xc000e417a0, 0x1340060, 0x13e0a60, 0xc000388e00)
/go/src/github.com/influxdata/influxdb/query/iterator.go:1302 +0x48
github.com/influxdata/influxdb/query.(*bufFloatIterator).Next(...)
/go/src/github.com/influxdata/influxdb/query/iterator.gen.go:90
github.com/influxdata/influxdb/query.(*bufFloatIterator).peek(0xc000e417e0, 0xc000e416c0, 0x1, 0x1)
/go/src/github.com/influxdata/influxdb/query/iterator.gen.go:65 +0xb9
github.com/influxdata/influxdb/query.(*floatIteratorScanner).Peek(0xc0008a9dc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc001165dd0)
/go/src/github.com/influxdata/influxdb/query/iterator.gen.go:516 +0x3d
github.com/influxdata/influxdb/query.(*scannerCursor).scan(0xc000302d80, 0xc00033c090, 0x203000, 0x0, 0x10101, 0x0, 0xc000187080, 0x2158700)
/go/src/github.com/influxdata/influxdb/query/cursor.go:241 +0x3a
github.com/influxdata/influxdb/query.(*scannerCursorBase).Scan(0xc000302d90, 0xc0006ff630, 0xc000048500)
/go/src/github.com/influxdata/influxdb/query/cursor.go:175 +0x48
github.com/influxdata/influxdb/query.(*Emitter).Emit(0xc000388e70, 0x1f6da78, 0xc000388e70, 0xc000388e70, 0xc0000450ac)
/go/src/github.com/influxdata/influxdb/query/emitter.go:41 +0x68
github.com/influxdata/influxdb/coordinator.(*StatementExecutor).executeSelectStatement(0xc000389570, 0xc0006b8d00, 0xc0001840c0, 0x0, 0x0)
/go/src/github.com/influxdata/influxdb/coordinator/statement_executor.go:561 +0x18b
github.com/influxdata/influxdb/coordinator.(*StatementExecutor).ExecuteStatement(0xc000389570, 0x2157c40, 0xc0006b8d00, 0xc0001840c0, 0x1, 0x1)
/go/src/github.com/influxdata/influxdb/coordinator/statement_executor.go:64 +0x38d5
github.com/influxdata/influxdb/query.(*Executor).executeQuery(0xc0004b3ad0, 0xc0007c4c00, 0xc0000450ac, 0xe, 0x0, 0x0, 0x2158700, 0x3166590, 0x2710, 0x0, ...)
/go/src/github.com/influxdata/influxdb/query/executor.go:334 +0x34e
created by github.com/influxdata/influxdb/query.(*Executor).ExecuteQuery
/go/src/github.com/influxdata/influxdb/query/executor.go:236 +0xc9
Sure thing! It's pretty simple so far, I've created a repo with the whole docker-compose so it should be as simple as cloning and you should be set :) It can be found here: https://github.com/slopee/influxdb_issue
As per system, docker is running on a Windows 10 Pro and the influxdb container has this system:
root@07d5ef33163d:/# uname -srm
Linux 4.9.184-linuxkit x86_64
Adding also Docker Desktop version: 2.1.0.4 (39773).
I might have some difficulties providing a script to do the whole flow but the way it works is:
As per the JSON format, I attach a few JSON with the input that Telegraf will be receiving. Each line would be one entry on a RabbitMQ queue. json_files.txt
Is there anything I can do to add more logs/troubelshoot what might be causing the issue? Seems like the exact same input of data can cause the issue in, what feels, 50% of the situations.
@slopee so based on the telegraf config provided in that repo, this is how your json data is being transformed into line protocol:
time,deviceId=albert-test,frameIndex=25,host=rsavage.lan,msts=1585119609368,path=./sample_data.txt,platform=WIN-2,sessionId=10:55:14\ -\ 03/25/2020 frameDurationMS=42.621,relativeFrameTime=9368.9386,unityFrameTime=4377240513470 1585119609368000000
Looks like you've got some tags that will cause a lot of cardinality in the DB. I recommend moving msts, frameIndex, and sessionId to be fields.
here's the updated telegraf config for json parsing:
data_format = "json"
json_name_key = "name"
tag_keys = ["platform","deviceId"]
json_string_fields = ["platform","deviceId","sessionId"]
json_time_key = "timestamp"
json_time_format = "unix_ms"
Which will produce something like this:
time,deviceId=albert-test,host=rsavage.lan,path=./sample_data.txt,platform=WIN-2 relativeFrameTime=9368.9386,unityFrameTime=4377240513470,msts=1585119609368,sessionId="10:55:14 - 03/25/2020",frameDurationMS=42.621,frameIndex=25 1585119609368000000
This doesn't solve your issue, but it might work around it until we can take a look.
I just tested changing those values and so far it seems to work,
I agree that I was causing a lot of cardinalities and that msts, frameIndex and sessionId were not needed as tags, thank you for the suggestion!
great to hear!
Hi, I have a telegraf(in docker container in Kubernetes) based service sending continuous stream of metrics to Wavefront through wavefront proxy.
The telegraf service getting crashed often with the below error and stop sending metrics to wavefront.
We are pulling the latest version of telegraf taz in our service docker file as below:
RUN curl --output telegraf.tar.gz https://dl.influxdata.com/telegraf/releases/telegraf-1.20.4_linux_amd64.tar.gz RUN tar --strip-components=2 -C / -xvvf telegraf.tar.gz
Telegraf version is telegraf-1.20.4
panic: runtime error: index out of range [0] with length 0
goroutine 354 [running]: github.com/wavefronthq/wavefront-sdk-go/senders.sanitizeInternal({0x0, 0x0}) /go/pkg/mod/github.com/wavefronthq/wavefront-sdk-go@v0.9.7/senders/formatter.go:340 +0x2d5 github.com/wavefronthq/wavefront-sdk-go/senders.MetricLine({0xc00d09eb20, 0xc00b3f9b00}, 0xc00b3f9b00, 0x61ccb037, {0xc00db72900, 0x24}, 0x6000103, {0xc00083c000, 0x2a}) /go/pkg/mod/github.com/wavefronthq/wavefront-sdk-go@v0.9.7/senders/formatter.go:56 +0x3ae github.com/wavefronthq/wavefront-sdk-go/senders.(directSender).SendMetric(0xc00081a0f0, {0xc00d09eb20, 0xc017249e30}, 0x40ead4, 0x0, {0xc00db72900, 0xfa00}, 0x0) /go/pkg/mod/github.com/wavefronthq/wavefront-sdk-go@v0.9.7/senders/direct.go:84 +0x48 github.com/influxdata/telegraf/plugins/outputs/wavefront.(Wavefront).Write(0xc0007bf550, {0xc03e15e000, 0xfa0, 0x0}) /go/src/github.com/influxdata/telegraf/plugins/outputs/wavefront/wavefront.go:172 +0x1dc github.com/influxdata/telegraf/models.(RunningOutput).write(0xc0002bd880, {0xc03e15e000, 0xfa0, 0xfa0}) /go/src/github.com/influxdata/telegraf/models/running_output.go:244 +0x118 github.com/influxdata/telegraf/models.(RunningOutput).WriteBatch(0xc0002bd880) /go/src/github.com/influxdata/telegraf/models/running_output.go:218 +0x58 github.com/influxdata/telegraf/agent.(Agent).flushOnce.func1() /go/src/github.com/influxdata/telegraf/agent/agent.go:829 +0x29 created by github.com/influxdata/telegraf/agent.(Agent).flushOnce /go/src/github.com/influxdata/telegraf/agent/agent.go:828 +0xb8 Note: Also this error is not resolved even with latest telegraf version 1.21.1
It’s the check which is done for tilde in https://github.com/wavefrontHQ/wavefront-sdk-go/blob/fb604f0e621b07590430f02d07eb85b86c69917a/senders/formatter.go#L343.
It’s because of possible empty or null point tag keys. If you see the function func MetricLine
there is no null/empty check for point tag keys. When passed on to sanitizeInternal () function it throws index out of range error. Also we don’t see any metrics we ship from our services have tilde prefix in metricName.
Please help us to resolve this issue.
Hello,
I am using docker-compose with influxdb 1.7.10 and latest telegraf. I have made an application that sends information to a rabbitmq queue that telegraf fetches and sends to influxdb. It apparently works fine but after a few inserts I just get a [panic:runtime error: index out of range] when I do a SHOW series on the database.
If I restart Influxdb or I run influx_inspect verify-seriesfile I get no error messages but after doing either of those options and performing a SHOW SERIES again (without adding/removing any data) it works.
What is happening? Is there a way I can get more logs or any idea of why is this happening?
This is the log when inserting and when querying for SHOW SERIES. I should add that frameIndex,platform,deviceId,sessionId,msts are tag keys.