vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
127 stars 57 forks source link

Allow parse_regex and parse_regex_all to receive as a parameter the result of dynamic to_regex like in match function #187

Open Genry1 opened 1 year ago

Genry1 commented 1 year ago

A note for the community

Use Cases

parse_regex and parse_regex_all are accepting the parameter to be only regex. Though the regex can be stored in a variable or resolved on the fly by the to_regex function. At the moment this is supported only by match function.

Attempted Solutions

Tried in different ways and the only possible option is to pass regex directly as an argument, like: parse_regex(r'')

Proposal

An option to pass the regex parameter as a variable or as a result of another function (so called group). e.g. parse_regex(to_regex("some_regex")) or my_regex = r''; parse_regex(my_regex)

References

Here is the issue which introduced the to_regex function https://github.com/vectordotdev/vector/issues/7051

Version

0.28.0

jszwedko commented 1 year ago

Constant regexes should be supported by https://github.com/vectordotdev/vrl/issues/151

@Genry1 do you have a use-case that requires dynamic regexes? That is: regexes built from event fields? Both examples above seem to be using constants.

Genry1 commented 1 year ago

Hey @jszwedko, the plan was to use the regex passed via a k8s (pod) annotation. Where some service can define what is the regex to use for parsing logs. Ideally, resolving and caching the regex would improve the performance. This approach is useful when you have thousands of services and it’s pretty hard to hardcode those regexes in the Vector configuration.

jszwedko commented 1 year ago

Hey @jszwedko, the plan was to use the regex passed via a k8s (pod) annotation. Where some service can define what is the regex to use for parsing logs. Ideally, resolving and caching the regex would improve the performance. This approach is useful when you have thousands of services and it’s pretty hard to hardcode those regexes in the Vector configuration.

Gotcha, thanks! Currently there is no of compiled regexes via to_regex caching so this operation will be pretty expensive, but possible. I think we can split that work up: make it possible to use dynamic regexes with parse_regex using this issue to track and adding caching to to_regex as a different issue.

Genry1 commented 1 year ago

Yep, sounds great! Thanks for looking into it!

jszwedko commented 1 year ago

Issue tracking caching: https://github.com/vectordotdev/vrl/issues/137

fpytloun commented 5 months ago

I have just the same use-case. Let me share the snippet:

        # Support vector annotations to perform parsing
        if is_nullish(parser) {
          parser = get!(.kubernetes.annotations, [join!(["vector.dev/parser", to_string!(.kubernetes.container_name)], separator: "-")])
          if is_nullish(parser) {
            parser = get!(.kubernetes.annotations, ["vector.dev/parser"])
          }
        }

        ## Generic parsers
        if parser == "json" {
          parsed, err = parse_json(.message)
          if err != null {
            log("Unable to parse event with JSON parser:" + err + ". Affected event left unparsed: " + encode_json(.), level: "warn", rate_limit_secs: 10)
          } else {
            . = merge!(to:., from:parsed)
          }
        } else if parser == "regex" {
          reg_ex = get!(.kubernetes.annotations, [join!(["vector.dev/parser", to_string!(.kubernetes.container_name), "regex"], separator: "-")])
          if is_nullish(reg_ex) {
            reg_ex = get!(.kubernetes.annotations, ["vector.dev/parser-regex"])
          }

          reg_ex, err = to_regex(reg_ex)
          if err != null {
            log("Unable to compile regex:" + err + ". Affected event left unparsed: " + encode_json(.), level: "error", rate_limit_secs: 10)
          } else {
            parsed_all, err = parse_regex_all(.message, reg_ex)
            parsed = {}

            for_each(parsed_all) -> |_index, value| {
              parsed = merge(parsed, compact!(value))
            }

            if err != null || parsed == null {
              log("Unable to parse event with regex parser:" + err + ". Affected event left unparsed: " + encode_json(.), level: "warn", rate_limit_secs: 10)
            } else {
              . = merge(to:., from:parsed)
            }
          }
        }