honeycombio / libhoney-go

Go library for sending data to Honeycomb
Apache License 2.0
58 stars 40 forks source link

Agent goroutine crashes #194

Closed to266 closed 1 year ago

to266 commented 2 years ago

Versions Agent version: 2.5.3 (as defined by the current helm chart)

Steps to reproduce

  1. Spin up k8s cluster on AKS (1.23.8)
  2. Deploy ingress-nginx helm chart
  3. Deploy honeycomb/agent helm chart with the following values
    
    honeycomb:
    existingSecret: honeycomb-api
    metrics:
    clusterName: staging
    metricGroups: [node, pod]
    interval: 60
    tolerations:
    - key: kubernetes.azure.com/scalesetpriority
    operator: Equal
    value: spot
    effect: NoSchedule
    - key: stackrate
    operator: Equal
    value: renderer
    effect: NoSchedule
    - key: CriticalAddonsOnly
    value: "true"
    effect: NoSchedule
    # A bunch of watchers targeting basically the same thing, just to get something to work
    watchers:
    # This one does not work with both the default (undefined) `log_format` as well as this explicitly configured one
    - labelSelector: app.kubernetes.io/instance=ingress-nginx
    namespace: ingress-nginx
    dataset: ingress
    parser: nginx
    options:
      # taken from the default values of the running nginx pod. Seems to also actually match the output.
      log_format: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"'
    # Attempted to use `nginx-ingress` based on honeycombio/honeycomb-kubernetes-agent#15 - no errors, but also no effect.
    - labelSelector: app.kubernetes.io/instance=ingress-nginx
    namespace: ingress-nginx
    dataset: ingress
    parser: nginx-ingress
    # This correctly separates only-controller logs
    - labelSelector: app.kubernetes.io/instance=ingress-nginx
    # containerName: controller
    namespace: ingress-nginx
    dataset: ingress-controller
    parser: glog
    # This just to make sure the correct pod and container are selected (they are), so for now commented out.
    #  - labelSelector: app.kubernetes.io/instance=ingress-nginx
    #    containerName: controller
    #    namespace: ingress-nginx
    #    dataset: ingress-notparsed
    #    parser: nop

 4. Observe how the pods crash. Logs of one below

time="2022-09-14T10:55:00Z" level=info msg="Initializing agent" version=2.5.3 time="2022-09-14T10:55:00Z" level=info msg="Setting log verbosity" newLevel=info time="2022-09-14T10:55:00Z" level=info msg="Creating retry buffer." size=0 timeout=0s time="2022-09-14T10:55:00Z" level=info msg="Creating Metrics Service Runner..." clusterName=staging includeNodeLabels=false interval=60ns metricGroups="map[node:true pod:true]" omitLabels="[]" time="2022-09-14T10:55:00Z" level=info msg="Creating Metrics Service Providers..." running time="2022-09-14T10:55:00Z" level=info msg="Setting up watcher for pod" Name=ingress-nginx-controller-564bd69cf4-m87lf UID=c23f39de-d227-4254-bc72-100a9095835a time="2022-09-14T10:55:00Z" level=info msg="starting watcher for pod" labelselector="app.kubernetes.io/instance=ingress-nginx,app!=honeycomb-agent" name=ingress-nginx-controller-564bd69cf4-m87lf namespace=ingress-nginx uid=c23f39de-d227-4254-bc72-100a9095835a time="2022-09-14T10:55:00Z" level=info msg="Setting up watcher for pod" Name=ingress-nginx-controller-564bd69cf4-m87lf UID=c23f39de-d227-4254-bc72-100a9095835a time="2022-09-14T10:55:00Z" level=info msg="starting watcher for pod" labelselector="app.kubernetes.io/instance=ingress-nginx,app!=honeycomb-agent" name=ingress-nginx-controller-564bd69cf4-m87lf namespace=ingress-nginx uid=c23f39de-d227-4254-bc72-100a9095835a time="2022-09-14T10:55:01Z" level=info msg="Log pattern" log pattern="/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a//" time="2022-09-14T10:55:01Z" level=info msg="Log pattern" log pattern="/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a//" time="2022-09-14T10:55:01Z" level=info msg="Initializing file handler" parser=nginx-ingress path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/controller/0.log time="2022-09-14T10:55:01Z" level=info msg="Initializing file handler" parser=glog path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/controller/0.log time="2022-09-14T10:55:01Z" level=info msg="Initializing file handler" parser=glog path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/linkerd-init/0.log time="2022-09-14T10:55:01Z" level=info msg="Initializing file handler" parser=glog path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/linkerd-proxy/0.log time="2022-09-14T10:55:01Z" level=info msg="Tailing file" offset=1497 path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/linkerd-proxy/0.log time="2022-09-14T10:55:01Z" level=info msg="Initializing file handler" parser=nginx-ingress path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/linkerd-init/0.log time="2022-09-14T10:55:01Z" level=info msg="Tailing file" offset=4088 path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/linkerd-init/0.log time="2022-09-14T10:55:01Z" level=info msg="Tailing file" offset=155102 path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/controller/0.log time="2022-09-14T10:55:01Z" level=info msg="Tailing file" offset=155102 path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/controller/0.log time="2022-09-14T10:55:01Z" level=info msg="Initializing file handler" parser=nginx-ingress path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/linkerd-proxy/0.log time="2022-09-14T10:55:01Z" level=info msg="Tailing file" offset=1497 path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/linkerd-proxy/0.log time="2022-09-14T10:55:01Z" level=info msg="Tailing file" offset=4088 path=/var/log/pods/ingress-nginx_ingress-nginx-controller-564bd69cf4-m87lf_c23f39de-d227-4254-bc72-100a9095835a/linkerd-init/0.log

goroutine 228 [sleep]: time.Sleep(0xee6b280) runtime/time.go:194 +0x12e github.com/hpcloud/tail/watch.(PollingFileWatcher).ChangeEvents.func1() github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/watch/polling.go:68 +0x9b created by github.com/hpcloud/tail/watch.(PollingFileWatcher).ChangeEvents github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/watch/polling.go:59 +0x1cc

goroutine 201 [sleep]: time.Sleep(0xee6b280) runtime/time.go:194 +0x12e github.com/hpcloud/tail/watch.(PollingFileWatcher).ChangeEvents.func1() github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/watch/polling.go:68 +0x9b created by github.com/hpcloud/tail/watch.(PollingFileWatcher).ChangeEvents github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/watch/polling.go:59 +0x1cc

goroutine 241 [select, 2 minutes]: github.com/hpcloud/tail.(Tail).waitForChanges(0xc000016160) github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/tail.go:349 +0x168 github.com/hpcloud/tail.(Tail).tailFileSync(0xc000016160) github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/tail.go:310 +0x5bb created by github.com/hpcloud/tail.TailFile github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/tail.go:133 +0x365

goroutine 204 [sleep]: time.Sleep(0xee6b280) runtime/time.go:194 +0x12e github.com/hpcloud/tail/watch.(PollingFileWatcher).ChangeEvents.func1() github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/watch/polling.go:68 +0x9b created by github.com/hpcloud/tail/watch.(PollingFileWatcher).ChangeEvents github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/watch/polling.go:59 +0x1cc

goroutine 202 [select, 1 minutes]: github.com/hpcloud/tail.(Tail).waitForChanges(0xc0005f4210) github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/tail.go:349 +0x168 github.com/hpcloud/tail.(Tail).tailFileSync(0xc0005f4210) github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/tail.go:310 +0x5bb created by github.com/hpcloud/tail.TailFile github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/tail.go:133 +0x365

goroutine 257 [sleep]: time.Sleep(0xee6b280) runtime/time.go:194 +0x12e github.com/hpcloud/tail/watch.(PollingFileWatcher).ChangeEvents.func1() github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/watch/polling.go:68 +0x9b created by github.com/hpcloud/tail/watch.(PollingFileWatcher).ChangeEvents github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/watch/polling.go:59 +0x1cc

goroutine 212 [select, 1 minutes]: github.com/hpcloud/tail.(Tail).waitForChanges(0xc0001d8000) github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/tail.go:349 +0x168 github.com/hpcloud/tail.(Tail).tailFileSync(0xc0001d8000) github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/tail.go:310 +0x5bb created by github.com/hpcloud/tail.TailFile github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/tail.go:133 +0x365

goroutine 213 [sleep]: time.Sleep(0xee6b280) runtime/time.go:194 +0x12e github.com/hpcloud/tail/watch.(PollingFileWatcher).ChangeEvents.func1() github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/watch/polling.go:68 +0x9b created by github.com/hpcloud/tail/watch.(PollingFileWatcher).ChangeEvents github.com/hpcloud/tail@v1.0.1-0.20170814160653-37f427138745/watch/polling.go:59 +0x1cc

goroutine 6064 [select]: net/http.(http2clientStream).writeRequest(0xc0005fe300, 0xc001206300) net/http/h2_bundle.go:8037 +0x9c9 net/http.(http2clientStream).doRequest(0x7f0a91af3900?, 0xc00037de60?) net/http/h2_bundle.go:7899 +0x1e created by net/http.(*http2ClientConn).RoundTrip net/http/h2_bundle.go:7828 +0x30a

goroutine 6063 [select]: net/http.(http2ClientConn).RoundTrip(0xc000c60000, 0xc001206300) net/http/h2_bundle.go:7869 +0x451 net/http.(http2Transport).RoundTripOpt(0xc000634080, 0xc001206300, {0x40?}) net/http/h2_bundle.go:7213 +0x1b7 net/http.(http2Transport).RoundTrip(...) net/http/h2_bundle.go:7174 net/http.http2noDialH2RoundTripper.RoundTrip({0x1f60d40?}, 0xc001206300?) net/http/h2_bundle.go:9677 +0x1b net/http.(Transport).roundTrip(0x1f60d40, 0xc001206200) net/http/transport.go:539 +0x390 net/http.(Transport).RoundTrip(0xc001206200?, 0x161d3a0?) net/http/roundtrip.go:17 +0x19 net/http.send(0xc001206100, {0x161d3a0, 0x1f60d40}, {0x13e8220?, 0x4c0301?, 0x1f74320?}) net/http/client.go:252 +0x5d8 net/http.(Client).send(0xc000c22510, 0xc001206100, {0x203000?, 0xc000b4e0f0?, 0x1f74320?}) net/http/client.go:176 +0x9b net/http.(Client).do(0xc000c22510, 0xc001206100) net/http/client.go:725 +0x8f5 net/http.(Client).Do(...) net/http/client.go:593 github.com/honeycombio/libhoney-go/transmission.(batchAgg).fireBatch(0xc0003b3900, {0xc000a36000?, 0x29, 0x40}) github.com/honeycombio/libhoney-go@v1.15.8/transmission/transmission.go:406 +0xa28 github.com/honeycombio/libhoney-go/transmission.(batchAgg).Fire(0xc0003b3900, {0x7f0a91af3900?, 0xc00037de60?}) github.com/honeycombio/libhoney-go@v1.15.8/transmission/transmission.go:273 +0x109 created by github.com/facebookgo/muster.(*Client).worker.func1 github.com/facebookgo/muster@v0.0.0-20150708232844-fd3d7953fd52/muster.go:133 +0xfb Stream closed EOF for default/honeycomb-agent-agent-gg4r6 (honeycomb-agent)

MikeGoldsmith commented 2 years ago

Hi @to266 - thanks for reporting the issue and sorry for the delay in replying.

I think this may be an issue with the underlying libhoney client that's used to send events to Honeycomb so I'm going to move the issue to that repo for tracking.

MikeGoldsmith commented 2 years ago

PS I'm also going to update both honeytail and libhoney dependencies as they are fairly out of date.

to266 commented 2 years ago

Awesome, thanks @MikeGoldsmith .

Is there some schedule you're following for cutting releases? Would be nice to get this out soonish :)

MikeGoldsmith commented 2 years ago

We can release the kubernetes agent quickly. Investigation for what might be causing the panic in libhoney may take longer to schedule 👍🏻

MikeGoldsmith commented 2 years ago

@to266 kubernetes agent release PR

MikeGoldsmith commented 1 year ago

Gonna close this issue for now. If this error continues, please reopen.