mozilla-services / heka

DEPRECATED: Data collection and processing made easy.
http://hekad.readthedocs.org/
Other
3.39k stars 531 forks source link

heka nginx_access_log error #1275

Open liyichao opened 9 years ago

liyichao commented 9 years ago

heka refused to start: heka invalid character '}' after top-level value :: invalid character '}' after top-level value

It is this line: {"seek":5744893491,"file_name":"/path/to/nginx.access.log","last_hash":"4fdd85bffb52b26b208e878541e338fc494320ed"}}

trink commented 9 years ago

Looks like there is still a bug with the LogstreamerInput checkpointing https://github.com/mozilla-services/heka/issues/740

rafrombrc commented 9 years ago

Unfortunately we've never been able to actually see it happen. Are you still using Heka? If so, has this recurred at all?

liyichao commented 9 years ago

Yes, we are still using heka, it occurs when heka is restarted. We rarely restart heka, but it seems there is a high probability that this recurs whenever we restart heka. I use SIGTERM when restarting heka.

rafrombrc commented 9 years ago

Thanks for the response. Since I can't reproduce the issue, I'll need your help to debug, hopefully you can provide me the info I need to get to the bottom of things.

Obviously it's clear that it's the extra trailing } causing the problem. When you hit this issue and Heka doesn't start, do you remove the } character to get past it? When you do, have you noticed whether or not the rest of the seek journal is correct? In other words, does Heka pick up parsing the log files at the correct location? If you don't remove the trailing }, how do you restart?

Additionally useful information would be your complete Heka configuration, what version of Heka you're using, on what platform, and whether or not you built Heka yourself or are using one of the binary package downloads.

Thanks!

liyichao commented 9 years ago
  1. Yes, we remove the extra "}", then it starts ok. We use heka to parse nginx logs in a directory, when there is a problem, only two of them have an extra "}", as to whether they are in correct localtion, because I did not care about that then, so I do not know.
  2. Our heka config is different from when this happened, I will try to recover what the original config file was. There are configuration for our two lua plugins.
[hekad]
maxprocs = 5
[StatsdInput]
address = "$port"
max_msg_size = 4096
[StatAccumInput]
ticker_interval = 10
percent_threshold = 95
[nginx_access_logs]
type = "LogstreamerInput"
parser_type = "token"
decoder = "nginx_access_decoder"
log_directory = "$directory"
file_match = '(?P.*)\.access\.log'
differentiator = ["DomainName"]
[nginx_access_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[nginx_access_decoder.config]
log_format = '$format'
type = "nginx.access"
[nginx_error_logs]
type = "LogstreamerInput"
parser_type = "token"
decoder = "nginx_error_decoder"
log_directory = "$directory"
file_match = '(?P.*)\.error\.log'
differentiator = ["DomainName"]
[nginx_error_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_error.lua"
[nginx_error_decoder.config]
type = "nginx.error"
tz = "Asia/Shanghai"
[ESLogstashV0Encoder]
index = "webnginx-log-%{2006.01.02}"
es_index_from_timestamp = true
type_name = "%{Type}"
[ElasticSearchOutput]
message_matcher = "Type == 'nginx.access' || Type == 'nginx.error'"
encoder = "ESLogstashV0Encoder"
server = "$server"
flush_count = 5000
http_timeout = 5000
[nginx_access_counter]
type = "SandboxFilter"
message_matcher = "Type == 'nginx.access'"
ticker_interval = 10
filename = "$filename"
[nginx_error_counter]
type = "SandboxFilter"
message_matcher = "Type == 'nginx.error'"
ticker_interval = 10
filename = "/usr/share/heka/lua_extend/nginx_error_counter.lua"
[CarbonOutput]
message_matcher = "Fields[payload_type] == 'stats'"
address = "$address"
protocol = "tcp"
tcp_keep_alive = true
  1. 0.8.3, wheezy 7.6, heka is built from source.
nathwill commented 9 years ago

we just hit this with version 0.9.2 as well, using this config: https://gist.github.com/nathwill/6f2826bd9d30ae00e43e, and ended up with this corruption in the logstreamer seek pointer file:

{"seek":1071,"file_name":"/srv/code_challenges/shared/log/mac_worker_3.log","last_hash":"b0f399b045472212e4d6d10e78e1416bca1196b7"}4"}