vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.48k stars 1.53k forks source link

`source_type` should be annotated at the topology level #13560

Open neuronull opened 2 years ago

neuronull commented 2 years ago

I spot checked a couple of these and realized that we actually aren't consistently adding source_type as I thought. For example: the dnstap and aws_ecs_metrics components don't seem to be annotating it, while the file source does. I wonder if this is something we could annotate at the topology level to ensure that all components do add it rather than relying on each component to do it individually. As it stands, we should probably just add source_type to the file source (optionally audit the rest of them) and open an issue to track doing it at the topology level.

Originally posted by @jszwedko in https://github.com/vectordotdev/vector/pull/13541#pullrequestreview-1038037942

fuchsnj commented 2 years ago

If this is added to the topology layer, make sure the schema is still accurate. This is also currently defined in each source. It would be nice if that was automatically added by the topology.

fuchsnj commented 2 years ago

Also consider other common "vector" metadata, such as the ingest_timestamp

jszwedko commented 2 years ago

Also consider other common "vector" metadata, such as the ingest_timestamp

Could this possible be delayed if there is backpressure such that the source can't forward the event right away?

fuchsnj commented 2 years ago

Could this possible be delayed if there is backpressure such that the source can't forward the event right away?

I haven't looked much into where this would go in the topology, but I would expect this would be implemented in such a way where it happens immediately after the event is generated (back-pressure wouldn't affect the results).