Exception encountered parsing namespace watch event. The connection might have been closed. Sleeping for 1 seconds and resetting the namespace watcher.error reading from socket: Connection reset by peer

Stiuil06 commented 1 year ago

What happened: We are losing a lot of logs (30-50%) and I found the below exception on monitoring pods:

2023-01-24 16:37:00 +0000 [info]: Exception encountered parsing pod watch event. The connection might have been closed. Sleeping for 1 seconds and resetting the pod watcher.error reading from socket: Connection reset by peer
2023-01-24 16:37:14 +0000 [info]: Exception encountered parsing namespace watch event. The connection might have been closed. Sleeping for 1 seconds and resetting the namespace watcher.error reading from socket: Connection reset by peer

What you expected to happen: It looks same like here, and there is proposition of solution(adjust fluentd parameters) but I cant figure out how to adjust it (open_timeout and read_timeout) via splunk-connect-for-kubernetes chart.
So if you dont have any other proposition of solution, maybe it is possible to add support to adjusting this properties?

How to reproduce it (as minimally and precisely as possible): Probably it is strictly connected to latency and load in my environment (Azure AKS).

Anything else we need to know?: We are using latest 1.5.2 chart version

Environment:

Kubernetes version (use kubectl version): 1.23.12
Ruby version (use ruby --version): we dont use ruby directly
OS (e.g: cat /etc/os-release): Node: [Kernel Version: 5.4.0-1098-azure OS/Arch: linux/amd64]
Splunk version: 9.0.1
Splunk Connect for Kubernetes helm chart version: 1.5.2
Others:

hvaghani221 commented 1 year ago

FYI, https://github.com/splunk/splunk-connect-for-kubernetes#end-of-support

I would recommend moving to Splunk OpenTelemetry Collector for Kubernetes. You can refer to this migration guide for more details.

hvaghani221 commented 1 year ago

We need to expose those timeout values at https://github.com/splunk/splunk-connect-for-kubernetes/blob/develop/helm-chart/splunk-connect-for-kubernetes/charts/splunk-kubernetes-logging/values.yaml#L56-L65

Stiuil06 commented 1 year ago

@harshit-splunk thanks, we will move to OpenTelemetry.

Do you think using otel and just splunk-kubernetes-objects chart for objects until otel doesn't support it is a good idea?

hvaghani221 commented 1 year ago

@Stiuil06, we are working on updating the migration guide. We recently added object collection support there. The configuration is similar. It will will be released with v0.68.0. You can try main branch to play with it.

Ref: https://github.com/signalfx/splunk-otel-collector-chart/blob/b1f0bd72085933a06d86419b736fecf021a6c1f3/helm-charts/splunk-otel-collector/values.yaml#L444-L481

hvaghani221 commented 1 year ago

@Stiuil06 I have just released v0.68.0. You can check that out.

Stiuil06 commented 1 year ago

@harshit-splunk thanks a lot. Migration will probably take me some time, but I'll let you know if everything is ok.

splunk / splunk-connect-for-kubernetes

Exception encountered parsing namespace watch event. The connection might have been closed. Sleeping for 1 seconds and resetting the namespace watcher.error reading from socket: Connection reset by peer #845