Closed ScottChapman closed 8 years ago
Check out partitioner_class and key_serializer_class. partition_key_format was removed because it doesn't actually do anything. I inserted it mistakenly.
Yea, but the default partitioner doesn't do random distribution by default any more.
It looked like the partition_key_format could be used to provide a per message key. Are you saying that the producer code was ignoring the key all together?
It will distribute eventually check out this thread for more info: https://github.com/joekiller/logstash-kafka/pull/29#issuecomment-50765003
Right, that's what I'm doing now; refreshing metadata every 30 seconds. But still, it would be better to be able to use the key...
Still not clear by what you mean that it didn't actually do anything?
So partition_key_format still works but it was being sent into the producer as part of the config which isn't used for configuring the producer itself. Sorry for the confusion.
Yea, looking at the code I thought it should work as advertised; that I could use an event field for the key (partition_key_format => "%{@timestamp}") and that would end up reasonably random.
Is that not how it worked? Since you removed it from the options I can no longer set it to see how it behaves.
When the message is sent to the broker it sends a topic, partition key, and message. See here: https://github.com/joekiller/logstash-kafka/blob/master/lib/logstash/outputs/kafka.rb#L145 On Mar 12, 2015 2:09 PM, "ScottChapman" notifications@github.com wrote:
Yea, looking at the code I thought it should work as advertised; that I could use an event field for the key (partition_key_format => "%{ @timestamp https://github.com/timestamp}") and that would end up reasonably random.
Is that not how it worked? Since you removed it from the options I can no longer set it to see how it behaves.
— Reply to this email directly or view it on GitHub https://github.com/joekiller/logstash-kafka/issues/53#issuecomment-78550919 .
Exactly, that's what I saw in the code. So, I think that would work properly given my example: partition_key_format => "%{@timestamp}"
Shouldn't that work?
I currently can't set it since partition_key_format is no longer a property I can set. But assuming I could, would it work correctly?
Partition key format is used when sending messages not when initializing the producer class itself so the commit you saw where I removed it from the kafka producer initialization options (not logstash-output-kafka options). The logstash class uses this option not the underlying producer class. When the underlying producer class receives a message it uses the key passed in via send message, which was determined by the partition key format. I'm unsure why it isn't distributing as evenly as you expect. If you provide some sample messages and config I can look at it some later. (Out of office for now) On Mar 12, 2015 2:34 PM, "ScottChapman" notifications@github.com wrote:
Exactly, that's what I saw in the code. So, I think that would work properly given my example: partition_key_format => "%{@timestamp https://github.com/timestamp}"
Shouldn't that work?
I currently can't set it since partition_key_format is no longer a property I can set. But assuming I could, would it work correctly?
— Reply to this email directly or view it on GitHub https://github.com/joekiller/logstash-kafka/issues/53#issuecomment-78559511 .
But I think it also tells logstash what it can accept as settings for the output plugin. If I try to set: partition_key_format => "%{@timestamp}" Logstash generates an error: Unknown setting 'partition_key_format' for kafka {:level=>:error}
Meh, thought I knew what I was doing... I tried hacking the kafka.rb file to see if I could figure it out and make it work. But what I DID figure out was that I don't know anything about kafka plugins! ;-)
So, I probably should have lead with the error I got above, I think that's the crux of my problem. I can't specify the partition_key_format...
Sorry if it sounded like I knew what I was talking about...
Any ideas at this point?
I'm on vacation until Monday so will try to take a look then. On Mar 13, 2015 1:25 PM, "ScottChapman" notifications@github.com wrote:
Any ideas at this point?
— Reply to this email directly or view it on GitHub https://github.com/joekiller/logstash-kafka/issues/53#issuecomment-79158762 .
No problem. Let me know when you surface and I'd be glad to help.
vacation worn off yet? ;-)
So before I dive into this more v0.7.0 is probably the best to build with logstash 1.4.2 however it doesn't include this feature nor does logstash 1.4.2 support the sprintf functionality correctly for the output as it was also added after 1.4.2. Logstash 1.5 is necessary for this particular one however due to the nature of how they changed their plugins, this repo isn't compatible with their changes (tests pass for that version but actually installing the plugin is bugged). So I would work with Logstash 1.5.0 RC2 on https://www.elastic.co/downloads/logstash
Right, currently running latest.
Anything I can help with?
Hey Joe, thought I would come back to this. We are currently running 1.5.0 RC2. But being able to specify the key based on a field of the event is pretty important to getting good partition distribution.
Can I help in anyway here?
Hi I had to remove the feature from this repo as it needs to work for 1.4.x. Could you open the issue on the logstash-output-kafka repo and reference this one? I'll leave this open for the time being.
Closing as this is supported in newer logstash-output-kafka versions.
I noticed that all output messages are going to the same partition, is there anyway for me to specify which partition or to specify a key for the message so I can get good distribution for my output topic?