vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.58k stars 1.54k forks source link

Support for including/excluding `docker_logs` sources by log driver #18875

Open marcus-crane opened 11 months ago

marcus-crane commented 11 months ago

A note for the community

Use Cases

Hi there,

We're currently in the process of adopting Docker and ran into a bit of a footgun.

Long story short, our use case would be the following:

Background on this request

For context, we run ECS services where Vector is deployed on a bunch of EC2 hosts and then the ECS agent (as a Docker sidecar) schedules containers onto each host.

At present, our Docker daemon supports containers that run the following log drivers: syslog, json-file and awslogs which will become relevant shortly.

For historical reasons, we've also been running Docker v19.03.15 for quite some time.

We started looking into upgrading to at least v20.10.6 to fix a known issue with containers using the json-file driver spitting out EOF errors as part of a race condition.

That's all fine but here is where the footgun arrives.

One of the benefits of v20.10.x is that it has a best-effort attempt at allowing logging from containers with any log driver. Funnily enough, it was something I was quite looking forward to but it never occurred to me that if it applies to the logs command (which is just an API client I suppose) then it would apply to Vector too.

We have a bunch of Logstash containers that process all of our logs that use awslogs. Upon upgrading, they suddenly became available to Vector, which start consuming logs and sending them to Logstash where they would be read again by Vector, generating even more logs.

Thankfully, due to nice disc buffering, our infrastructure held up quite well, and it was a pretty fun load test.

Attempted Solutions

In an ideal world, the ECS agent itself would provide the current log driver as a label on the container.

If that was the case, we could configure Vector to just ignore that.

We could potentially inject these labels at task definition registration time (an ECS thing) but it'd be an imperfect solution as not all changes necessarily go through CI so it's technically possible to change a log driver in an emergency and have a mismatching label for example.

~Filtering ECS services by labels will be our strategy for the meantime~ Nevermind, it's not possible to exclude labels, only include them so the only avenue here is excluding based on container name.

The downside though is that this assumes those maintaining logging infrastructure have a perfect, and always up to date view of all new services (and services that have changed log drivers) which is hard to do for a wide production platform that has many services, across many clusters.

Proposal

~My proposal would be to introduce include_drivers and exclude_drivers in the same vein as include_containers and exclude_containers.~

After thinking about it a bit more, I think just include_log_drivers would work fine. There is only a small number of drivers so listing them out is not really a big deal with the default being to include all drivers.

Perhaps include_drivers as a namespace give the log bit may be a bit redundant. It also seems to link up with how Bollard references things under the hood and reads nicer I think.

Just a plain text match would work, no regex capability would be required.

References

No response

Version

vector 0.33.0 (x86_64-unknown-linux-gnu 89605fb 2023-09-27 14:18:24.180809939)

marcus-crane commented 11 months ago

Ironically, I was not using https://vector.dev/docs/reference/configuration/transforms/throttle/ which I was aware of but had put off until later. I guess this will save us next time πŸ˜…

marcus-crane commented 11 months ago

The docker logs source (in the codebase) is a lot smaller and less scary than I would have imagined so I'd be happy to take a crack at contributing this feature.

I don't know any Rust as I write this but the implementation, if it sounds reasonable, should pretty much mirror some of the existing filters

jszwedko commented 11 months ago

Hi @marcus-crane !

Thanks for this thorough issue and description of your use-case! I think it generally makes sense to add the additional filter. We'd be happy to help support a contribution here if you are motivated πŸ˜„

marcus-crane commented 11 months ago

Hey @jszwedko,

Just confirming that we're definitely interested in this and I've made a start on a contribution, although it might take a wee bit as I'm going on a short holiday next week πŸ™‚

I'll post an update here when I've got a PR ready