mozilla-services / heka

DEPRECATED: Data collection and processing made easy.
http://hekad.readthedocs.org/
Other
3.4k stars 531 forks source link

Heka inserting duplicate records to Elasticsearch DB #1949

Closed DonHarishAxe closed 8 years ago

DonHarishAxe commented 8 years ago

I have a script that generates a set of JSON objects as output. And my Heka config is as follows

[DemoProcessInput] type = "ProcessInput" ticker_interval = 40 stdout = true stderr = false splitter = 'new_splitter' decoder = 'JsonDecoder' immediate_start=true [DemoProcessInput.command.0] bin = "sudo" args = ["/home/harish.se/scripts/mail/mail_wrapper.sh"] [new_splitter] type = "TokenSplitter" delimiter = "}" [JsonDecoder] type = "SandboxDecoder" filename = "lua_decoders/json.lua" [JsonDecoder.config] type = "mail_usage" payload_keep = true map_fields = true [ESJsonEncoder] index = "%{Type}-index" es_index_from_timestamp = false type_name = "%{Type}" fields = ["Timestamp","Hostname","DynamicFields"] [ESJsonEncoder.field_mappings] Timestamp = "@timestamp"

[ElasticSearchOutput] message_matcher = "TRUE" encoder = "ESJsonEncoder" flush_interval = 10 server = "http://172.16.173.145:9200"

The problem is that the records are getting inserted twice in some cases and only once (preferred) in some cases. I do not know why. Kindly help.

sathieu commented 8 years ago

Have you tried with another output (i.e to console +rstdecoder)? Are the duplicate here too?

DonHarishAxe commented 8 years ago

Nevermind that was due to use_buffering set to true by default.