Closed rossmcdonald closed 7 years ago
It seems that this must be coming from the tcp_listener
input plugin. For some reason there are metrics which are parsing correctly via tcp_listener, but then are actually invalid metrics that can't get translated to an InfluxDB point.
I think it's likely that this is fixed in 1.3, because the function that is panicking doesn't exist anymore, so it shouldn't panic but should instead raise an error when the metric gets written to InfluxDB (and will subsequently just be dropped).
Would be good to figure out what the problem metrics are, so we can then write a unit-test that will catch it.
My telegraf also crashes repeatedly using the following config:
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = false
logfile = ""
hostname = ""
omit_hostname = false
[[outputs.influxdb]]
urls = ["http://HOST:8086"] # required
database = "DB" # required
retention_policy = ""
write_consistency = "any"
timeout = "5s"
username = "USER"
password = "PASS"
[[inputs.logparser]]
files = ["/var/log/httpd/*access_log"]
from_beginning = false
[inputs.logparser.grok]
patterns = ["%{CUSTOM_LOG_FORMAT}"]
measurement = "apache_access_log"
custom_patterns = '''
CUSTOM_LOG_FORMAT %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequ
est})" %{NUMBER:response} (?:%{NUMBER:bytes:int}|-) %{NUMBER:responsetime:int} us %{QS:Referrer} %{QS:Agent}
'''
log output:
panic: runtime error: slice bounds out of range
goroutine 32 [running]: panic(0xf2b720, 0xc4200100d0) /usr/local/go/src/runtime/panic.go:500 +0x1a1 github.com/influxdata/telegraf/metric.(metric).Fields(0xc420094800, 0xc4206279c0) /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/metric/metric.go:279 +0x5dd github.com/influxdata/telegraf/plugins/inputs/logparser.(LogParserPlugin).parser(0xc42014a090) /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/logparser/logparser.go:205 +0x250 created by github.com/influxdata/telegraf/plugins/inputs/logparser.(*LogParserPlugin).Start /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/logparser/logparser.go:131 +0x62d
@miniskipper I think your issue is unrelated, can you open it as a new issue and also include some sample logs and, ideally, try to find a sample log that reproduces the crash.
Similar issue here when loading in an apache access_log w/ logparser:-
Telegraf v1.2.1 (git: release-1.2 3b6ffb344e5c03c1595d862282a6823ecb438cff)
[[inputs.logparser]]
## file(s) to tail:
files = ["/tmp/input.log"]
from_beginning = true
name_override = "test_metric"
## For parsing logstash-style "grok" patterns:
[inputs.logparser.grok]
patterns = ["%{COMMON_LOG_FORMAT}"]
[[outputs.file]]
## Files to write to, "stdout" is a specially handled file.
files = ["stdout", "/tmp/output.log"]
data_format = "influx"
[[outputs.influxdb]]
## The full HTTP or UDP endpoint URL for your InfluxDB instance.
urls = ["http://influxdb:8086"] # required
## The target database for metrics (telegraf will create it if not exists).
database = "telegraf" # required
## Write timeout (for the InfluxDB client), formatted as a string.
timeout = "5s"
panic: runtime error: slice bounds out of range
goroutine 14 [running]:
panic(0xf2b720, 0xc42000c0b0)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/influxdata/telegraf/metric.(*metric).Fields(0xc4207e8680, 0xc42091e4e0)
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/metric/metric.go:279 +0x5dd
github.com/influxdata/telegraf/plugins/inputs/logparser.(*LogParserPlugin).parser(0xc42007a120)
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/logparser/logparser.go:205 +0x250
created by github.com/influxdata/telegraf/plugins/inputs/logparser.(*LogParserPlugin).Start
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/logparser/logparser.go:131 +0x62d
Closing, fixed in 1.3
Is there a workaround until 1.3 comes out? The influx service keeps crashing with the same runtime error.
1.3 is out :)
Bug report
Relevant telegraf.conf:
Steps to reproduce:
Seeing this panic fairly regularly:
Let me know what other information is needed!