joekiller / logstash-kafka

Kafka plugin for Logstash
Apache License 2.0
206 stars 63 forks source link

any way to identify partition or key for output? #53

Closed ScottChapman closed 8 years ago

ScottChapman commented 9 years ago

I noticed that all output messages are going to the same partition, is there anyway for me to specify which partition or to specify a key for the message so I can get good distribution for my output topic?

joekiller commented 9 years ago

Check out partitioner_class and key_serializer_class. partition_key_format was removed because it doesn't actually do anything. I inserted it mistakenly.

ScottChapman commented 9 years ago

Yea, but the default partitioner doesn't do random distribution by default any more.

It looked like the partition_key_format could be used to provide a per message key. Are you saying that the producer code was ignoring the key all together?

joekiller commented 9 years ago

It will distribute eventually check out this thread for more info: https://github.com/joekiller/logstash-kafka/pull/29#issuecomment-50765003

ScottChapman commented 9 years ago

Right, that's what I'm doing now; refreshing metadata every 30 seconds. But still, it would be better to be able to use the key...

Still not clear by what you mean that it didn't actually do anything?

joekiller commented 9 years ago

So partition_key_format still works but it was being sent into the producer as part of the config which isn't used for configuring the producer itself. Sorry for the confusion.

ScottChapman commented 9 years ago

Yea, looking at the code I thought it should work as advertised; that I could use an event field for the key (partition_key_format => "%{@timestamp}") and that would end up reasonably random.

Is that not how it worked? Since you removed it from the options I can no longer set it to see how it behaves.

joekiller commented 9 years ago

When the message is sent to the broker it sends a topic, partition key, and message. See here: https://github.com/joekiller/logstash-kafka/blob/master/lib/logstash/outputs/kafka.rb#L145 On Mar 12, 2015 2:09 PM, "ScottChapman" notifications@github.com wrote:

Yea, looking at the code I thought it should work as advertised; that I could use an event field for the key (partition_key_format => "%{ @timestamp https://github.com/timestamp}") and that would end up reasonably random.

Is that not how it worked? Since you removed it from the options I can no longer set it to see how it behaves.

— Reply to this email directly or view it on GitHub https://github.com/joekiller/logstash-kafka/issues/53#issuecomment-78550919 .

ScottChapman commented 9 years ago

Exactly, that's what I saw in the code. So, I think that would work properly given my example: partition_key_format => "%{@timestamp}"

Shouldn't that work?

I currently can't set it since partition_key_format is no longer a property I can set. But assuming I could, would it work correctly?

joekiller commented 9 years ago

Partition key format is used when sending messages not when initializing the producer class itself so the commit you saw where I removed it from the kafka producer initialization options (not logstash-output-kafka options). The logstash class uses this option not the underlying producer class. When the underlying producer class receives a message it uses the key passed in via send message, which was determined by the partition key format. I'm unsure why it isn't distributing as evenly as you expect. If you provide some sample messages and config I can look at it some later. (Out of office for now) On Mar 12, 2015 2:34 PM, "ScottChapman" notifications@github.com wrote:

Exactly, that's what I saw in the code. So, I think that would work properly given my example: partition_key_format => "%{@timestamp https://github.com/timestamp}"

Shouldn't that work?

I currently can't set it since partition_key_format is no longer a property I can set. But assuming I could, would it work correctly?

— Reply to this email directly or view it on GitHub https://github.com/joekiller/logstash-kafka/issues/53#issuecomment-78559511 .

ScottChapman commented 9 years ago

But I think it also tells logstash what it can accept as settings for the output plugin. If I try to set: partition_key_format => "%{@timestamp}" Logstash generates an error: Unknown setting 'partition_key_format' for kafka {:level=>:error}

ScottChapman commented 9 years ago

Meh, thought I knew what I was doing... I tried hacking the kafka.rb file to see if I could figure it out and make it work. But what I DID figure out was that I don't know anything about kafka plugins! ;-)

So, I probably should have lead with the error I got above, I think that's the crux of my problem. I can't specify the partition_key_format...

Sorry if it sounded like I knew what I was talking about...

ScottChapman commented 9 years ago

Any ideas at this point?

joekiller commented 9 years ago

I'm on vacation until Monday so will try to take a look then. On Mar 13, 2015 1:25 PM, "ScottChapman" notifications@github.com wrote:

Any ideas at this point?

— Reply to this email directly or view it on GitHub https://github.com/joekiller/logstash-kafka/issues/53#issuecomment-79158762 .

ScottChapman commented 9 years ago

No problem. Let me know when you surface and I'd be glad to help.

ScottChapman commented 9 years ago

vacation worn off yet? ;-)

joekiller commented 9 years ago

So before I dive into this more v0.7.0 is probably the best to build with logstash 1.4.2 however it doesn't include this feature nor does logstash 1.4.2 support the sprintf functionality correctly for the output as it was also added after 1.4.2. Logstash 1.5 is necessary for this particular one however due to the nature of how they changed their plugins, this repo isn't compatible with their changes (tests pass for that version but actually installing the plugin is bugged). So I would work with Logstash 1.5.0 RC2 on https://www.elastic.co/downloads/logstash

ScottChapman commented 9 years ago

Right, currently running latest.

ScottChapman commented 9 years ago

Anything I can help with?

ScottChapman commented 9 years ago

Hey Joe, thought I would come back to this. We are currently running 1.5.0 RC2. But being able to specify the key based on a field of the event is pretty important to getting good partition distribution.

Can I help in anyway here?

joekiller commented 9 years ago

Hi I had to remove the feature from this repo as it needs to work for 1.4.x. Could you open the issue on the logstash-output-kafka repo and reference this one? I'll leave this open for the time being.

joekiller commented 8 years ago

Closing as this is supported in newer logstash-output-kafka versions.