mozilla-services / heka

DEPRECATED: Data collection and processing made easy.
http://hekad.readthedocs.org/
Other
3.4k stars 531 forks source link

KafkaInput makes duplicates when more that one consumer is used #1910

Open psychonaut opened 8 years ago

psychonaut commented 8 years ago

With configuration:

[KafkaInput-logs]
type     = "KafkaInput"
topic    = "heka-logs"
addrs    = ["localhost:9092"]
splitter = "KafkaSplitter"
group    = "qaus"
decoder  = "KafkaDecoder-multi"

[KafkaSplitter]
type              = "NullSplitter"
use_message_bytes = true

[KafkaDecoder-multi]
type             = 'MultiDecoder'
subs             = [ 'ProtobufDecoder', 'KafkaDecoder-datacenter' ]
log_sub_errors   = false
cascade_strategy = 'all'

[KafkaDecoder-datacenter]
type     = 'SandboxDecoder'
filename = 'lua_decoders/datacenter.lua'

lua_decoders/datacenter.lua is simple injection of datacenter value:

function process_message ()
    write_message("Fields[datacenter]", "qaus")
    return 0
end

When i use two heka instances with this configuration I've got everything duplicated. When I use three instances I've got every message triple times. Configuration is identical on every instance (so group value is identical as well). I assumed that using group will eliminate duplicates.

simonpasquier commented 8 years ago

This might be related to https://github.com/mozilla-services/heka/issues/1714. Which versions of Heka and Kafka are you using? And how many partitions are there for the "heka-logs" topic?

psychonaut commented 8 years ago

heka 0.10.0 and kafka 0.9.0.1

Topic:heka-logs PartitionCount:2 ReplicationFactor:2 Configs: Topic: heka-logs Partition: 0 Leader: 1 Replicas: 1,3 Isr: 3,1 Topic: heka-logs Partition: 1 Leader: 3 Replicas: 3,2 Isr: 3,2

nickchappell commented 8 years ago

Pinging @trink ...

Might be good to summarize our conversation on IRC a few days ago re: Kafka and Heka

elemoine commented 8 years ago

Might be good to summarize our conversation on IRC a few days ago re: Kafka and Heka

Yes, please do :)

relistan commented 8 years ago

@nickchappell can you summarize? :+1:

nickchappell commented 8 years ago

Heka will soon get a newer version of the Lua sandbox that includes a C Kafka library, librdkafka: https://github.com/edenhill/librdkafka

It supports all of the 0.8 and 0.9 features, including the high-level balanced consumer for v0.9 brokers.

I don't know the timing of when the newer sandbox will get rolled in though, or if it will be in a 0.11 release or be held back until 0.12.

@rafrombrc @trink: care to comment/elaborate?

psychonaut commented 8 years ago

How soon is soon?