Open atibdialpad opened 3 years ago
Hi @jszwedko can we expect this as part of v17 ?
Hey @atibdialpad ! Would you be able to confirm all of the fields you'd like to be templatable? I believe project
and topic
would be relatively straight-forward, but allowing templating of credentials_file
would be involved since that would require creating completely separate GCP clients for each templated value.
Hi @jszwedko topic and project shd be a good place to start. I can create 1 sink per transform for different credentials which is better than one per
I wanted to understand what decides whether a field would be templatable or not ? Another example I came across is the metric type in the log_to_metric transform.
I wanted to understand what decides whether a field would be templatable or not ? Another example I came across is the metric type in the log_to_metric transform.
It's essentially determined at development time. If we look at the gcp_pubsub
sink, you can see that these fields are just defined as String:
https://github.com/vectordotdev/vector/blob/master/src/sinks/gcp/pubsub.rs#L31-L33
If you look at the tenant_id
field in the Loki sink (https://github.com/vectordotdev/vector/blob/master/src/sinks/loki.rs#L40) you can see that it is of type Template
. This means that it can accept a template field. You can also see this field has the template
badge in the documentation: https://vector.dev/docs/reference/configuration/sinks/loki/#tenant_id
So, why don't we just make all fields templatable? It's definitely true that we could have more fields templatable than we currently have, but it is something we need to be careful of. Sometimes it doesn't make sense, or it significantly impacts performance or code complexity. A number of the sinks rely on a given field being the same for each event it sends out. This allows it to batch the fields efficiently. For example a dynamic credentials_file
field would add a lot of complexity at runtime. So we need to consider each field based on how that field is being used.
Makes a lot of sense. Thanks @StephenWakely and @jszwedko It seems to me the right way ahead is to make more fields templatable where templating makes sense and for sinks/transforms where it is not possible, route(fan-out) can be used to create a common infra.
Something like: Say there a 10 different services which can emit metrics, but not all will have 'counter' type metrics but there are certain common processing that I need to do to all metrics (say added common tags, but we cannot have metric_type templatable.
[transform.svc_1_transform]
.metric_type = "counter"
...
[transform.svc_2_transform]
.metric_type = "gauge"
...
[transforms.common_metric_processing]
inputs = [svc_1_transform, svc_2_transform, ... <all svc transforms which want to emit metrics>]
# do common stuff
[transforms.metric_router]
route.counter = ".metric_type == 'counter'"
route.gauge = ".metric_type == 'gauge'"
...
[transforms.log_to_counter_metric]
inputs = [metric_router.counter]
[[transforms.log_to_counter_metric.metrics]]
type = "counter" <------ No need for templating
....
[transforms.log_to_gauge_metric]
inputs = [metric_router.gauge]
[[transforms.log_to_gauge_metric.metrics]]
type = "gauge" <------ No need for templating
....
[transforms.metric_post_processing]
inputs = [log_to_*_metric]
# any common post processing for all metrics
[sinks.metric_server]
# datadog metrics / prometheus / ...
What do you guys think ? Will this cause any sort of bottleneck ? Again, the sole idea is to keep the common pre and post processing at a single places rather than write it N times for all services
@StephenWakely from your earlier msg "A number of the sinks rely on a given field being the same for each event it sends out. This allows it to batch the fields efficiently. " Does this also apply to transforms, for example the log_to_metric transform might batch events with same metric_type(counter/gauge).. for performance ?
@StephenWakely from your earlier msg "A number of the sinks rely on a given field being the same for each event it sends out. This allows it to batch the fields efficiently. " Does this also apply to transforms, for example the log_to_metric transform might batch events with same metric_type(counter/gauge).. for performance ?
No just for sinks. The reason for the batching is that network requests are more efficient with more data being sent in a single request rather than many requests being sent with a small amount of data in each.
Hello Folks, I am also looking for the same feature to create PubSub sinks for vector pipeline, Is there any update or progress on the templating feature?
Hello Folks, I am also looking for the same feature to create PubSub sinks for vector pipeline, Is there any update or progress on the templating feature?
Unfortunately nothing yet. We are happy to accept PRs for this enhancement though 🙂
From discord_thread
gcp_pubsub sink should be able to populate fields (project, topic, credentials_file...) via templating. Currently it errors out
My config :
probably because it literally sets project and topic to "{{ project }}" and "{{ topic }}"
@StephenWakely @jszwedko