Open Genry1 opened 1 year ago
Constant regexes should be supported by https://github.com/vectordotdev/vrl/issues/151
@Genry1 do you have a use-case that requires dynamic regexes? That is: regexes built from event fields? Both examples above seem to be using constants.
Hey @jszwedko, the plan was to use the regex passed via a k8s (pod) annotation. Where some service can define what is the regex to use for parsing logs. Ideally, resolving and caching the regex would improve the performance. This approach is useful when you have thousands of services and it’s pretty hard to hardcode those regexes in the Vector configuration.
Hey @jszwedko, the plan was to use the regex passed via a k8s (pod) annotation. Where some service can define what is the regex to use for parsing logs. Ideally, resolving and caching the regex would improve the performance. This approach is useful when you have thousands of services and it’s pretty hard to hardcode those regexes in the Vector configuration.
Gotcha, thanks! Currently there is no of compiled regexes via to_regex
caching so this operation will be pretty expensive, but possible. I think we can split that work up: make it possible to use dynamic regexes with parse_regex
using this issue to track and adding caching to to_regex
as a different issue.
Yep, sounds great! Thanks for looking into it!
Issue tracking caching: https://github.com/vectordotdev/vrl/issues/137
I have just the same use-case. Let me share the snippet:
# Support vector annotations to perform parsing
if is_nullish(parser) {
parser = get!(.kubernetes.annotations, [join!(["vector.dev/parser", to_string!(.kubernetes.container_name)], separator: "-")])
if is_nullish(parser) {
parser = get!(.kubernetes.annotations, ["vector.dev/parser"])
}
}
## Generic parsers
if parser == "json" {
parsed, err = parse_json(.message)
if err != null {
log("Unable to parse event with JSON parser:" + err + ". Affected event left unparsed: " + encode_json(.), level: "warn", rate_limit_secs: 10)
} else {
. = merge!(to:., from:parsed)
}
} else if parser == "regex" {
reg_ex = get!(.kubernetes.annotations, [join!(["vector.dev/parser", to_string!(.kubernetes.container_name), "regex"], separator: "-")])
if is_nullish(reg_ex) {
reg_ex = get!(.kubernetes.annotations, ["vector.dev/parser-regex"])
}
reg_ex, err = to_regex(reg_ex)
if err != null {
log("Unable to compile regex:" + err + ". Affected event left unparsed: " + encode_json(.), level: "error", rate_limit_secs: 10)
} else {
parsed_all, err = parse_regex_all(.message, reg_ex)
parsed = {}
for_each(parsed_all) -> |_index, value| {
parsed = merge(parsed, compact!(value))
}
if err != null || parsed == null {
log("Unable to parse event with regex parser:" + err + ". Affected event left unparsed: " + encode_json(.), level: "warn", rate_limit_secs: 10)
} else {
. = merge(to:., from:parsed)
}
}
}
A note for the community
Use Cases
parse_regex
andparse_regex_all
are accepting the parameter to be only regex. Though the regex can be stored in a variable or resolved on the fly by theto_regex
function. At the moment this is supported only bymatch
function.Attempted Solutions
Tried in different ways and the only possible option is to pass regex directly as an argument, like:
parse_regex(r'')
Proposal
An option to pass the regex parameter as a variable or as a result of another function (so called group). e.g.
parse_regex(to_regex("some_regex"))
ormy_regex = r''; parse_regex(my_regex)
References
Here is the issue which introduced the
to_regex
function https://github.com/vectordotdev/vector/issues/7051Version
0.28.0