Open liyichao opened 9 years ago
Looks like there is still a bug with the LogstreamerInput checkpointing https://github.com/mozilla-services/heka/issues/740
Unfortunately we've never been able to actually see it happen. Are you still using Heka? If so, has this recurred at all?
Yes, we are still using heka, it occurs when heka is restarted. We rarely restart heka, but it seems there is a high probability that this recurs whenever we restart heka. I use SIGTERM when restarting heka.
Thanks for the response. Since I can't reproduce the issue, I'll need your help to debug, hopefully you can provide me the info I need to get to the bottom of things.
Obviously it's clear that it's the extra trailing }
causing the problem. When you hit this issue and Heka doesn't start, do you remove the }
character to get past it? When you do, have you noticed whether or not the rest of the seek journal is correct? In other words, does Heka pick up parsing the log files at the correct location? If you don't remove the trailing }
, how do you restart?
Additionally useful information would be your complete Heka configuration, what version of Heka you're using, on what platform, and whether or not you built Heka yourself or are using one of the binary package downloads.
Thanks!
[hekad] maxprocs = 5 [StatsdInput] address = "$port" max_msg_size = 4096 [StatAccumInput] ticker_interval = 10 percent_threshold = 95 [nginx_access_logs] type = "LogstreamerInput" parser_type = "token" decoder = "nginx_access_decoder" log_directory = "$directory" file_match = '(?P.*)\.access\.log' differentiator = ["DomainName"] [nginx_access_decoder] type = "SandboxDecoder" filename = "lua_decoders/nginx_access.lua" [nginx_access_decoder.config] log_format = '$format' type = "nginx.access" [nginx_error_logs] type = "LogstreamerInput" parser_type = "token" decoder = "nginx_error_decoder" log_directory = "$directory" file_match = '(?P .*)\.error\.log' differentiator = ["DomainName"] [nginx_error_decoder] type = "SandboxDecoder" filename = "lua_decoders/nginx_error.lua" [nginx_error_decoder.config] type = "nginx.error" tz = "Asia/Shanghai" [ESLogstashV0Encoder] index = "webnginx-log-%{2006.01.02}" es_index_from_timestamp = true type_name = "%{Type}" [ElasticSearchOutput] message_matcher = "Type == 'nginx.access' || Type == 'nginx.error'" encoder = "ESLogstashV0Encoder" server = "$server" flush_count = 5000 http_timeout = 5000 [nginx_access_counter] type = "SandboxFilter" message_matcher = "Type == 'nginx.access'" ticker_interval = 10 filename = "$filename" [nginx_error_counter] type = "SandboxFilter" message_matcher = "Type == 'nginx.error'" ticker_interval = 10 filename = "/usr/share/heka/lua_extend/nginx_error_counter.lua" [CarbonOutput] message_matcher = "Fields[payload_type] == 'stats'" address = "$address" protocol = "tcp" tcp_keep_alive = true
we just hit this with version 0.9.2 as well, using this config: https://gist.github.com/nathwill/6f2826bd9d30ae00e43e, and ended up with this corruption in the logstreamer seek pointer file:
{"seek":1071,"file_name":"/srv/code_challenges/shared/log/mac_worker_3.log","last_hash":"b0f399b045472212e4d6d10e78e1416bca1196b7"}4"}
heka refused to start: heka invalid character '}' after top-level value :: invalid character '}' after top-level value
It is this line: {"seek":5744893491,"file_name":"/path/to/nginx.access.log","last_hash":"4fdd85bffb52b26b208e878541e338fc494320ed"}}