Open tlvenn opened 5 years ago
Thanks @tlvenn, the logging-operator
project is very interesting. We'll dig in and see what we can do.
Hi @tlvenn thanks for opening this. I did some research into this. This sounds like some that would be very helpful for our users.
From what I have been reading it looks like having a vector operator could provide a way to configure different Vector topologies running on top of kubernetes. We would be able to define a few CRD's that would allow users to express their Vector deployments ontop of kube with kube like configuration that integrates with kube very naturally.
Under the hood this Vector operator could deploy a daemonset of Vector onto each kube node. It could then supply each agent with a config that either allows vector to run in a distributed topology. On top of this, the operator could also then deploy a statefulset that will act as a central vector agent that allows the node agents to connect to itself and is in charge of providing a durable disk buffer and a way to output logs to some external service configuring TLS all the way. Also on top of this, this operator could provide a way to configure an entire log transform pipeline within the same cluster configuring kafka as an intermediate message broker. This would then provide the stream based topology setup.
An example of a simple deployment that fluentbit's logging operator does can be seen here. This configures fluentbit to write to s3 and shows that you can set up logging very easily and naturally.
Since this relates to how Vector is deployed, I think this might not belong in the current initial containers milestone. That said, I do think this could be extremely valuable to allow users to quickly get up and running with more complex vector deployments.
Yep that's pretty much it and the possibility are endless, beyond CRDs to describe the entire desired vector pipeline, the operator could also react to some annotations on a container to inject some logging vector sidecars but yes you definitely captured the gist of it.
This particular diagram illustrate best the whole idea:
Note that the operator approach also pave the way for some IoC where the application bundles some custom CRDs to declare its logging streams and how they should be routed. Very similarly to the ServiceMonitor
CRD with the Prometheus operator.
Hi @binarylogic, was wondering where your current thinking is regarding the idea of having a K8S Operator to orchestrate / deploy vector ?
Hi @tlvenn, @MOZGIII should be able to answer that since he is heading up our k8s integration currently.
Hey @tlvenn! I'm currently working on a new k8s integration outlined at this RFC: https://github.com/timberio/vector/blob/master/rfcs/2020-04-04-2221-kubernetes-integration.md We don't yet have plans to implement Vector operator in the backlog, but we're working on the building blocks to enable that option for us in the future.
The scope of the Vector operator would be to use CRDs to describe and rollout complete logging topologies, i.e. deploy Vector for pod log collection as a DaemonSet
on each node and to deploy another Vector as a Deployment
to gather all the log streams from the whole cluster and do advanced processing. Another responsibility of the operator would be, effectively, to assemble Vector configuration from the CRDs (as opposed to user-supplied .toml
configs). All that is very advanced functionality, and we're building the k8s integration with that in mind. We're currently working on prerequisites to make is possible.
Furthermore, instead of building our own operator, we might be able to work with https://github.com/banzaicloud/logging-operator to add Vector support there. We'll look into it after we complete the initial integration with k8s.
I'm personally looking forward to the scenario where apps can ship with their own logging configurations (i.e. transform specifications) as CRDs, and an operator dynamically configures Vector to process and ship the logs accordingly.
Awesome, thanks a lot for the feedback and looking forward for all of this as well !
@binarylogic @MOZGIII @tlvenn I can start working on it to have a first version by the end of September.
cc/ @spencergilbert
Hey @aelbarkani - I'm curious what you're looking to get out of an operator here. Is it going to solve a problem that the existing helm charts aren't?
Hey @spencergilbert. Logging operator of Banzaicloud is a good example. We run multi-tenant clusters where each application team can have isolated namespaces in a cluster. Application teams have limited access to the cluster (they don't have access to cluster-wide resources nor privileged containers). And usually they don't care, the only thing the application teams want is to be admin in their namespaces, and leave the cluster management hassle to infrastructure team. In a Kubernetes cluster Vector agents are privileged, and thus end users should not have access to them. However, the users would need some sort of API (a Kubernetes Custom Resource) in order to request sources, transforms and sinks. Basically the idea is to be able to offer to our users a fully managed Vector instance in multi-tenant Kubernetes clusters, and I don't think that would be possible with a Helm chart.
@aelbarkani - would you be interested in having a call with us to discuss your plans and thoughts around the operator more fully? If you do, you can email me at spencer.gilbert (at) datadoghq.com
Hi @aelbarkani , please take a moment to read the material and articles linked above, the value props of the vector operator should be pretty apparent.
Hi @tlvenn. I must say that value proposition of an operator is not always straightforward. In this case it is, and I can say that since we've been using Banzaicloud's operator in dozens of clusters in production and developed many for our clients. Now, my comment was about one use case (the one that I'm interested in), but of course there are many others. Don't hesitate to explain a little bit more your use cases.
@spencergilbert yep, just sent you an email !
Dho my comment was for @spencergilbert , I made a mistake mentioning you :facepalm:
@tlvenn no worries !
Hi All
I would just like to add my support for this issue. It would be great if dev teams could configure vector transforms by themselves.
Hi all,
I want to points out there is another use case to better integrate or replace prometheus for metrics collection. Currently, a lot of helm charts ship with PodMonitor/ServiceMonitor
to allow scraping by Prometheus. However, Vector can't utilize these directly. It would be awesome if the vector operator can monitor existing PodMonitor
/ ServiceMonitor
to scrape metrics when configured to do so.
Hi. Please try this one - https://github.com/kaasops/vector-operator We released it for deploying and configuring Vector in Kubernetes. (Like how Logging Operator does it, but with some differences).
You can use CRDs:
Vector
- for deploy Vector instance
VectorPipeline
- for deploy sources/transforms/sinks in namespace scope
ClusterVectorPipeline
- for deploy sources/transforms/sinks in cluster scope
Deckhouse Kubernetes Platform has a log-shipper
module, which is basically an operator that is built around vector.
Simple configuration example:
apiVersion: deckhouse.io/v1alpha1
kind: ClusterLoggingConfig
metadata:
name: system-logs
spec:
type: KubernetesPods
kubernetesPods:
namespaceSelector:
matchNames:
- kube-system
destinationRefs:
- loki-storage
---
apiVersion: deckhouse.io/v1alpha1
kind: ClusterLogDestination
metadata:
name: loki-storage
spec:
type: Loki
loki:
endpoint: http://loki.loki:3100
Hi,
Not an issue strictly with vector but hoping to spark some ideas regarding the integration with k8s. An operator similar to what https://github.com/banzaicloud/logging-operator does for fluent-bit would be pretty awesome.