kubetail-org / kubetail

Web dashboard for Kubernetes logs that lets you view multiple log streams simultaneously, in real-time. Runs on desktop or in cluster.
https://www.kubetail.com
Apache License 2.0
369 stars 16 forks source link

Make Agents DaemonSet optional #116

Open dihmandrake opened 1 month ago

dihmandrake commented 1 month ago

Hi everyone,

The newly introduced Agents (DaemonSet) on every node provide additional functionality. I recognize that this might be useful for many people (especially the owner of this repo), but it introduces overhead for clusters with many nodes and might not be required by everyone.

Would it be possible to make the Agent optional and disable the functionality it provides (effectively keeping the behavior of pre-0.7 versions)? I could imagine a sort of indicator in the UI: "Agents disabled" instead of the last events, etc.

Also, in the Helm Chart, this could be done by setting i.e. agents.enabled: false, which defaults to true.

Would this be something "easily" accomplishable?

amorey commented 1 month ago

Thanks for asking about the agents. I've already started working on code that uses the agents to perform log search and in the future the agents will also be used for notifications and other features that I think will be really useful. They're designed to be extremely light weight (typically <15MB memory and negligible CPU) so personally I think the trade off is worth it but I understand your concern. Would agents be helpful to you if they enabled search and other advanced features?

dihmandrake commented 1 month ago

I can imagine this is an amazing feature for most users and yourself.

For myself, I want to have a "dead simple log viewer for my pods," sort of a replacement for kubectl logs, which kubetail was up to version 0.7. By simple, I also mean a minimal number of pods (max. 2 for redundancy) and being able to be run without any privileges. With Network Policies, this becomes a minimal attack surface and has minimal resource overhead. Mounting host folders for logs violates that and introducing an additional DaemonSet. Basically, I just view the logs of a few Pods in case I notice an immediate issue. Anything else would be for me rather a "full log solution" like Elastic or Loki.

I could imagine something like a config option SIMPLE_MODE that keeps the previous behavior via the kubeapi instead of node-local operations via agents. Nevertheless, this would introduce diverging code paths. Obviously, I am not a contributor and just a user. Hence, I appreciate all your effort and take what I can get — simply trying to share my use case/idea.

amorey commented 1 month ago

Thanks for the details. For your use case, why not use kubectl logs -l? Are you currently using Elastic, Loki or something else for "full solution" logging?

amorey commented 1 month ago

I'm still thinking about how to implement this so we can support it in the future. For now, I added an experimental and undocumented feature to enable/disable agents to the helm chart. After upgrading to 0.7.3 you can set kubetail.agent.enabled to false to disable the agents:

helm repo update
helm upgrade kubetail kubetail/kubetail --namespace=kubetail --set kubetail.agent.enabled=false

When agents are disabled, the columns that expect data from the agents won't update but the rest of the app should work normally.

dihmandrake commented 1 month ago

My apologies for the late response.

Thanks for the details. For your use case, why not use kubectl logs -l? Are you currently using Elastic, Loki or something else for "full solution" logging?

In short, kubectl logs works for these use-cases, but I became quite fond of kubetail. Mainly due to the UX. It runs behind some Authenticated Proxy (in my current Lab cluster that is NGINX with oauth2-proxy). From time to time I fall back to kubectl, but kubetail actually became my default.

I'm still thinking about how to implement this so we can support it in the future. For now, I added an experimental and undocumented feature to enable/disable agents to the helm chart. [...] When agents are disabled, the columns that expect data from the agents won't update but the rest of the app should work normally.`

For my use-case this would totally suffice. Keeping the DaemonSet out and obviously losing the new functionality. Thank you for taking my use-case into account. Will see that I give it a go soon (need to find the time) once you release the new version. Edit: Just seen that the Helm Chart is already release. Will try to deploy it this week

amorey commented 1 month ago

No worries, thanks for the details. Great to hear you like the UX! I'll keep thinking about how to solve this and get back to you.

amorey commented 1 month ago

@dihmandrake I created a cli tool that runs the app locally in "simple" mode using only your Kubernetes API. You can download it for your platform from the release page here: https://github.com/kubetail-org/kubetail/releases/tag/0.8.0-rc1

After downloading it, make it executable and then run the "serve" subcommand:

mv kubetail-darwin-arm64 kubetail
chmod u+x kubetail
./kubetail serve

Please try it out and let me know what you think! Although the "advanced" cluster-side installation will still be available via helm, I'm thinking of making this the default entry point for kubetail.

amorey commented 1 month ago

Quick update, you can also install it with homebrew now:

brew install kubetail-org/tap/kubetail
dihmandrake commented 1 month ago

@amorey I finally had some time to test the version and changes - my apologies it took so long.

Anyways, I am now running the latest version via helm (0.7.4) with the agent mode disabled and some other small tweaks. It works perfectly for my use-case. Thanks again for considering it.

For completeness, I am running kubetail fully in the cluster behind an authenticated gateway. Locally isn't one of my use-cases, but I see that it could come in handy for others.

amorey commented 1 month ago

Thanks! Great to hear it's working well with the agents disabled. If you have time I'd love to hear more about your use case and why it isn't possible to use kubectl locally to communicate with your cluster. If you'd prefer to chat off-github, you can find me on discord, slack, or send me an email.

dihmandrake commented 1 month ago

I just noticed that kubetail is having some issues with being unable to read EndpointSlices (they were introduced for the agents if I recall correctly):

Oct 23, 2024 20:36:49.931
W1023 20:36:49.931663 1 reflector.go:561] pkg/mod/k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:kubetail:kubetail-cluster-server" cannot list resource "endpointslices" in API group "discovery.k8s.io" in the namespace "kubetail"
Oct 23, 2024 20:36:49.933
E1023 20:36:49.932836 1 reflector.go:158] "Unhandled Error" err="pkg/mod/k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User \"system:serviceaccount:kubetail:kubetail-cluster-server\" cannot list resource \"endpointslices\" in API group \"discovery.k8s.io\" in the namespace \"kubetail\"" logger="UnhandledError"
Oct 23, 2024 20:37:28.448
W1023 20:37:28.448238 1 reflector.go:561] pkg/mod/k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:kubetail:kubetail-cluster-server" cannot list resource "endpointslices" in API group "discovery.k8s.io" in the namespace "kubetail"
Oct 23, 2024 20:37:28.448
E1023 20:37:28.448461 1 reflector.go:158] "Unhandled Error" err="pkg/mod/k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User \"system:serviceaccount:kubetail:kubetail-cluster-server\" cannot list resource \"endpointslices\" in API group \"discovery.k8s.io\" in the namespace \"kubetail\"" logger="UnhandledError"
Oct 23, 2024 20:38:14.697
W1023 20:38:14.697377 1 reflector.go:561] pkg/mod/k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:kubetail:kubetail-cluster-server" cannot list resource "endpointslices" in API group "discovery.k8s.io" in the namespace "kubetail"
Oct 23, 2024 20:38:14.697
E1023 20:38:14.697519 1 reflector.go:158] "Unhandled Error" err="pkg/mod/k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User \"system:serviceaccount:kubetail:kubetail-cluster-server\" cannot list resource \"endpointslices\" in API group \"discovery.k8s.io\" in the namespace \"kubetail\"" logger="UnhandledError"

Also, it appears the browser fails to establish a WebSocket connection (noticed via Chrome Devtools). Henceforth, it appears the log streaming is broken, but I cannot validate it. All of these might be related.

Both are non issues for me and since I am bit short on time to investigate this. I would kindly ask you to decide if you would like to keep this issue open or close it @amorey

Edit: Correction: WebSocket issue is gone. Probably related to NGINX or other gateways in between. My apologies for blaming Kubetail

amorey commented 1 month ago

Ok, thanks for letting me know. I'll take a look and get a bugfix out asap.

dihmandrake commented 1 month ago

Thanks! Great to hear it's working well with the agents disabled. If you have time I'd love to hear more about your use case and why it isn't possible to use kubectl locally to communicate with your cluster. If you'd prefer to chat off-github, you can find me on discord, slack, or send me an email.

For my small dev env it is only a UX decision. kubectl needs to be started and authenticated with certificates or others. Guess I am mostly lazy.

My cluster gateways have OIDC with users already built in & Chrome is open anyways. Additionally, Kubetail is just easier, prettier and faster. Takes a second vs. 10 seconds Yes, it would be possible to properly integrate kubectl and change my workflow, but browser UIs served me well for log investigation over the past years. Let it be Hyperscaler integrated, Kibana or Grafana. It's not easy to change these habits.

Lastly, for any development work VS Code is open most of the time and I use the Kubernetes addon log function. Hope this helps for my use-case. I appreciate your consideration, but I also understand that I might be rather specific and my use-case will fall to an eventual "development progression death".

As a side note, I used Kubernetes Dashboard for most of these tasks before, but it exploded in complexity recently (especially the Kong requirement). Hence, I removed it to avoid a bunch of tools, deployments and CRDs, which I do not really need & upgrades become rather painful. Btw, this is how I got to Kubetail in the first place and also Headlamp.

Fully off topic, it would be super interesting to expand K8s api server & kubelet and include the information currently collected by the Kubetail agents (log size & last event if I am not mistaken - log search probably too complex). Might be a fun, not sure how cumbersome, project if you ever feel like it.

amorey commented 1 month ago

The EndpointSlices bug should be fixed in the new helm chart (v0.8.0):

helm repo update
helm upgrade kubetail kubetail/kubetail --values /path/to/values.yaml

Let me know if you notice any other problems.

Thanks for the details about your development workflow! Doing everything through browser UIs makes sense if you already have auth set up on the cluster. We'll see how "simple mode" develops but I think it's a good entry point for new users so I could see it becoming a core feature.

dihmandrake commented 1 month ago

The EndpointSlices bug should be fixed in the new helm chart (v0.8.0):

helm repo update
helm upgrade kubetail kubetail/kubetail --values /path/to/values.yaml

Let me know if you notice any other problems. [...]

I was not able to test the new version exhaustively, but from a quick deployment and look around it appears the EndpointSlice logging issue has been resolved. Thank you very much