Make gcp_pubsub sink's fields templatable

atibdialpad commented 3 years ago

gcp_pubsub sink should be able to populate fields (project, topic, credentials_file...) via templating. Currently it errors out

x Sink "gcp_pubsub": URI parse error: invalid uri character

My config :

[sinks.gcp_pubsub_sink]
  type = "gcp_pubsub"
  inputs = ["*_gcp_pubsub_parser"]
  credentials_path = "/etc/vector/gcs_private_key_staging.json"
  project = "{{ project }}"
  topic = "{{ topic }}"

  [sinks.gcp_pubsub_sink.encoding]
    timestamp_format = "rfc3339"

  [sinks.gcp_pubsub_sink.healthcheck]
    enabled = true

probably because it literally sets project and topic to "{{ project }}" and "{{ topic }}"

@StephenWakely @jszwedko

atibdialpad commented 3 years ago

Hi @jszwedko can we expect this as part of v17 ?

jszwedko commented 3 years ago

Hey @atibdialpad ! Would you be able to confirm all of the fields you'd like to be templatable? I believe project and topic would be relatively straight-forward, but allowing templating of credentials_file would be involved since that would require creating completely separate GCP clients for each templated value.

atibdialpad commented 3 years ago

Hi @jszwedko topic and project shd be a good place to start. I can create 1 sink per transform for different credentials which is better than one per

atibdialpad commented 3 years ago

I wanted to understand what decides whether a field would be templatable or not ? Another example I came across is the metric type in the log_to_metric transform.

StephenWakely commented 3 years ago

I wanted to understand what decides whether a field would be templatable or not ? Another example I came across is the metric type in the log_to_metric transform.

It's essentially determined at development time. If we look at the gcp_pubsub sink, you can see that these fields are just defined as String:

https://github.com/vectordotdev/vector/blob/master/src/sinks/gcp/pubsub.rs#L31-L33

If you look at the tenant_id field in the Loki sink (https://github.com/vectordotdev/vector/blob/master/src/sinks/loki.rs#L40) you can see that it is of type Template. This means that it can accept a template field. You can also see this field has the template badge in the documentation: https://vector.dev/docs/reference/configuration/sinks/loki/#tenant_id

So, why don't we just make all fields templatable? It's definitely true that we could have more fields templatable than we currently have, but it is something we need to be careful of. Sometimes it doesn't make sense, or it significantly impacts performance or code complexity. A number of the sinks rely on a given field being the same for each event it sends out. This allows it to batch the fields efficiently. For example a dynamic credentials_file field would add a lot of complexity at runtime. So we need to consider each field based on how that field is being used.

atibdialpad commented 3 years ago

Makes a lot of sense. Thanks @StephenWakely and @jszwedko It seems to me the right way ahead is to make more fields templatable where templating makes sense and for sinks/transforms where it is not possible, route(fan-out) can be used to create a common infra.

Something like: Say there a 10 different services which can emit metrics, but not all will have 'counter' type metrics but there are certain common processing that I need to do to all metrics (say added common tags, but we cannot have metric_type templatable.

[transform.svc_1_transform]
  .metric_type = "counter"
  ...

[transform.svc_2_transform]
  .metric_type = "gauge"
  ...

[transforms.common_metric_processing]
  inputs = [svc_1_transform, svc_2_transform, ... <all svc transforms which want to emit metrics>]
  # do common stuff

[transforms.metric_router]
  route.counter = ".metric_type == 'counter'"
  route.gauge = ".metric_type == 'gauge'"
  ...

[transforms.log_to_counter_metric]
  inputs = [metric_router.counter]
  [[transforms.log_to_counter_metric.metrics]]
    type = "counter" <------ No need for templating
    ....

[transforms.log_to_gauge_metric]
  inputs = [metric_router.gauge]
  [[transforms.log_to_gauge_metric.metrics]]
    type = "gauge" <------ No need for templating
    ....

[transforms.metric_post_processing]
  inputs = [log_to_*_metric]
  # any common post processing for all metrics

[sinks.metric_server]
   # datadog metrics / prometheus / ...

What do you guys think ? Will this cause any sort of bottleneck ? Again, the sole idea is to keep the common pre and post processing at a single places rather than write it N times for all services

atibdialpad commented 3 years ago

@StephenWakely from your earlier msg "A number of the sinks rely on a given field being the same for each event it sends out. This allows it to batch the fields efficiently. " Does this also apply to transforms, for example the log_to_metric transform might batch events with same metric_type(counter/gauge).. for performance ?

StephenWakely commented 3 years ago

@StephenWakely from your earlier msg "A number of the sinks rely on a given field being the same for each event it sends out. This allows it to batch the fields efficiently. " Does this also apply to transforms, for example the log_to_metric transform might batch events with same metric_type(counter/gauge).. for performance ?

No just for sinks. The reason for the batching is that network requests are more efficient with more data being sent in a single request rather than many requests being sent with a small amount of data in each.

satishkm26 commented 2 years ago

Hello Folks, I am also looking for the same feature to create PubSub sinks for vector pipeline, Is there any update or progress on the templating feature?

jszwedko commented 2 years ago

Hello Folks, I am also looking for the same feature to create PubSub sinks for vector pipeline, Is there any update or progress on the templating feature?

Unfortunately nothing yet. We are happy to accept PRs for this enhancement though 🙂

vectordotdev / vector

Make gcp_pubsub sink's fields templatable #9129