vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
115 stars 52 forks source link

Allow expressions as file_desc and message_type args #900

Open nb-mouse opened 2 weeks ago

nb-mouse commented 2 weeks ago

This might be handy to dynamically load a protobuf description files and specifying message type based on payload being transformed.

This implementation allows to pass expressions instead of literal-only (constant) arguments.

Use case:

[sources.protobuf]
type = "stdin"
decoding.codec = "protobuf"
decoding.protobuf.desc_file = "event.desc"
decoding.protobuf.message_type = "EventLog"

[transforms.transform]
type = "remap"
inputs = ["protobuf"]
source = """
message_type = replace(.event.proto_data.type_url, "type.googleapis.com/", "") ?? ""
file_desc = sha1(message_type) + ".desc"
. = {
  "id": .event.id,
  "type": message_type,
  "name": parse_proto(.event.proto_data.value, file_desc, message_type)
} ?? {}
"""

PS. I have a follow-up for recursive handling Any messages when processing source configuration.

nb-mouse commented 2 weeks ago

@jszwedko, this was a first attempt to handle events with Any messages. Eventually, we have some protos that defines more sub-messages of the Any type and current PR cannot handle that. There are hundreds of different messages and most of them is not possible to collect as single desc file. I was thinking to implement a LRU cache for loaded descriptors shared with several message types. The tricky part how to organize desc file catalog to load a proper desc file for a given message type.

Here is very early draft demonstrating an approach that works with several levels of Any messages: https://github.com/vectordotdev/vrl/commit/db26ba7dd6e3614ac82133a4b9ef9f5d1d303235

nb-mouse commented 2 weeks ago

It comes out that we can combine several desc files into one. So, we can use one desc file defined in source config and recursively attempt to load a desc from it when parsing Any message.

For this particular PR we indeed might introduce an option either to load descriptor at compile time or dynamically. But I don't know how to implement that. Any other functions implements something similar?

nb-mouse commented 2 weeks ago

Perhaps, as additional feature we can add an option for the source config for protobuf to specify a directory with desc files and merge them on the Vector startup.

nb-mouse commented 2 weeks ago

Instead of extra parameters we might introduce separate proto_parse function. To simplify the logic and explicitly distinct functionality.

nb-mouse commented 2 weeks ago

Hopefully, this should fit better:

https://github.com/vectordotdev/vrl/pull/901 https://github.com/vectordotdev/vector/pull/20708

The changes above affects only source decoder. Perhaps, we can pass an extra config option to enable/disable this lookup feature.

jszwedko commented 2 weeks ago

Thanks for the other PR! We'll take a look at that one.