vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.48k stars 1.53k forks source link

Add `exclude_topics` option to kafka source #7580

Open geekflyer opened 3 years ago

geekflyer commented 3 years ago

vector version: 0.13.1

Hi, we're using the Kafka source. It's a common use-case to subscribe to a bunch of topics which match a certain name pattern, but exclude a few. Currently this is very challenging to achieve with vector.

While vector supports regex in the topics config, it does not support negative lookaheads to exclude certain patterns.

I.e. this will crash: topics = "(?!kube_logs_foo)^kube_logs.*"

This is a result of the regex crate not supporting lookaheads for performance reasons.

It therefore would be great to introduce a exclude_topics option. Exclude topics should be an array that also accepts topic names or regexp patterns.

All topics that got matched by topics but are also contained/matched by exclude_topics should be not subscribed to / excluded.

Discord convesation for context https://discord.com/channels/742820443487993987/746070591097798688/846473636033986620

geekflyer commented 3 years ago

Also if anyone stumbles upon this ticket, here's a janky workaround for now (copied from our code):

  # match all topics named `kube_logs*` except "kube_logs_acme"
  # this required a bunch of regex hacking inspired by https://stackoverflow.com/a/37988661 since rust's regex engine doesn't support lookaheads.
  # verify these patterns via https://rustexp.lpil.uk/ 
  # quick explanation of the patterns:
  # 1. The first one matches all topics named `kube_logs_<some_suffix_with_exactly_4_chars_that_is_not_acme>`.
  # 2. The second one matches all topics named `kube_logs_<some_suffix_with_0..3_chars>`.
  # 3. The third one matches all topics named `kube_logs_<some_suffix_with_5_or_more_chars>`.
  # topics = ["^kube_logs.*"]
  topics = ["^kube_logs_([^a]...|.[^c]..|..[^m].|...[^e])$","^kube_logs_?.{0,3}$","^kube_logs_.{5,}$"]
fpytloun commented 3 years ago

Another option is

librdkafka_options."topic.blacklist" = "^fluentd.kube.(someapp|anotherapp).*$"