logstash-plugins / logstash-input-kafka

Kafka input for Logstash
Apache License 2.0
139 stars 122 forks source link

Ability to provide a topics pattern blacklist #183

Open hartfordfive opened 7 years ago

hartfordfive commented 7 years ago

Having the ability to specify a topics_pattern_blacklist as a new parameter could help in cases where you want to subscribe to all kafka topics on a given broker, except for those matching that given pattern.

For example, I have a situation where I have multiple logstash configs (1 config file per Elasticsearch type) that either use the topics or topics_pattern parameter, although I have a bunch of remaining Elasticsearch types that don't require any special processing. We just accept the event as is and the resulting document in elasticsearch only has the message field. Seeing that internal tenants may add new types without necessarily notifying my team, we don't know the type names ahead of time. Say that I apply custom parsing for types syslog, apachecombined-access, apache-error, auth, custom-app, and oauth2-proxy. I would like for any types non in that list be caught all by a single kafka input such as the following:

input {
  kafka {
    group_id => "cg-${LOGSTASH_TENANT}-general-logs"
    decorate_events => true
    consumer_threads => "2" 
    bootstrap_servers => "[LOGSTASH_KAFKA_BROKERS_LIST]"
    topics_blacklist_pattern => "logstash-[TENANT_NAME]-(syslog|apachecombined-access|apache-error|auth|custom-app|oauth2-proxy)"
    codec => "json"
  }
}

This would easily allow me to pick up events from any remaining topics, instead of having to indicate them via either topics_list or topics_pattern

@ph What are your thoughts on this?

hartfordfive commented 7 years ago

Any updates on this?

jordansissel commented 7 years ago

Is this possible with the Kafka client? The KafkaConsumer.subscribe() method seems to not give any indication that this is possible.

https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

KafkaConsumer.subscribe(pattern) allows a pattern, which is cool for matching things we want. However, java.util.regex.Pattern is a final class in Java which means we cannot subclass it to provide any additional functionality (such as rejecting something matched by the topics_pattern setting)

This may requires a change in Kafka to support? I don't know.

connorworkman commented 4 years ago

You can technically do this exclusion with a regex pattern, but I agree it would be a nice-to-have feature. For example, if you wanted to exclude topics containing the terms "syslog" or "apache-error", you could do this: topics_pattern => "logstash-[TENANT_NAME]-(?!syslog|apache-error).*"

I'm using this sort of workaround in order to separate a few topics into different consumers.