Closed leahnp closed 6 years ago
After fixing an upstream error by updating the version (https://github.com/samsung-cnct/k2-logging-fluent-bit-daemonset/pull/19) logs are getting printed to stdout. If we change the output to kafka plugin, the following error occurs on the Pods:
[2017/09/19 16:13:10] [ info] [engine] started
Failed to start Sarama producer: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
[2017/09/19 16:13:11] [ info] [filter_kube] https=1 host=kubernetes.default.svc port=443
[2017/09/19 16:13:11] [ info] [filter_kube] local POD info OK
[2017/09/19 16:13:11] [ info] [filter_kube] testing connectivity with API server...
[2017/09/19 16:13:11] [ info] [filter_kube] API server connectivity OK
panic: interface conversion: interface is codec.RawExt, not uint64
goroutine 17 [running, locked to thread]:
panic(0x7f0239366ba0, 0xc820076500)
/usr/lib/go-1.6/src/runtime/panic.go:481 +0x3ea
main.encode_as_json(0x7f023925e560, 0xc8201327e0, 0x0, 0x0, 0x0, 0x0, 0x0)
/fluent-bit-kafka-output-plugin/out_kafka.go:119 +0x120
main.FLBPluginFlush(0x7f0234c40010, 0xc8001f4256, 0x1e5e080, 0x7f0238b22f40)
/fluent-bit-kafka-output-plugin/out_kafka.go:64 +0x3ac
main._cgoexpwrap_0a4fe733c09b_FLBPluginFlush(0x7f0234c40010, 0x61647075001f4256, 0x1e5e080, 0x656e69225c3d796c)
command-line-arguments/_obj/_cgo_gotypes.go:89 +0x35
Having spent a couple days digging into this, I think I have identified a few problems.
Failed to start Sarama producer: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
panic: interface conversion: interface is codec.RawExt, not uint64
goroutine 17 [running, locked to thread]: panic(0x7f0ee15d7ba0, 0xc820066840) /usr/lib/go-1.6/src/runtime/panic.go:481 +0x3ea main.encode_as_json(0x7f0ee14cf560, 0xc82013ea60, 0x0, 0x0, 0x0, 0x0, 0x0) /fluent-bit-kafka-output-plugin/out_kafka.go:119 +0x120 main.FLBPluginFlush(0x7f0ed8e44010, 0xc80017d7b5, 0x1b54960, 0x7f0ee0d93f40) /fluent-bit-kafka-output-plugin/out_kafka.go:64 +0x3ac main._cgoexpwrap_0a4fe733c09b_FLBPluginFlush(0x7f0ed8e44010, 0x30755c5a0017d7b5, 0x1b54960, 0x5f726f7461727473) command-line-arguments/_obj/_cgo_gotypes.go:89 +0x35```
It seems as though there is a golang error in the output plugin that should be fixed.
This error appears regardless of whether kafka is deployed as a service on the cluster or not. Which leads me to believe that:
kafka-0
and events show a scheduling error:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4m 11s 20 default-scheduler Warning FailedScheduling PersistentVolumeClaim is not bound: "datadir-kafka-0" (repeated 15 times)```
I have been looking into that error, and it appears it is related to needing a matching Volume somewhere on the cluster, and kafka fails to find that. I tried deploying a logging-central-fluentd Deployment as given here to no effect.
It is possible that I need to check if I am using the correct container images, which will be my next step.
status report: Hunting down zookeeper chart and found helm bug on versions < 1.7 Nodes were too small for kafka pods to run so needing to upgrade the cluster to bigger worker nodes. Currently working on reliably spinning up kafka pods and getting fluentbit-kafka-plugin to work.
Status report:
Per inquiry on fluent slack, the basic plugin template from here only supports fluent-bit v0.11.x. It does not support v 0.12.x, which makes sense given the golang error that persists above.
We want to be able to use the systemd
plugin, which is new with v0.12.x.
Mocking a local dev environment has proven tricky, since both the tail
and systemd
plugins only work on Linux. Eduardo from fluent-bit was both apologetic and helpful in suggesting a workaround to mock systemd data on my machine.
Conclusion:
At this point, this plugin doesn't seem compatible with our current fluent-bit daemonset (both on this repo and on the new chart repo
One solution would be to rewrite the golang in the output plugin to process the incoming data appropriately.
Update: There is currently an open issue on fluent-bit plugin template to support v 0.12. Results of attempting to use kafka output plugin:
random
input plugin and kafka output plugin, kafka server displayed changes properly, whether through a config file or using command line flags.random
, systemd
, or tail
plugins. Stdout output would print to terminal just fine. systemd
and tail
plugins turned out to be difficult, as neither plugin supports MacOS. I have not tried running a container locally, since I am not sure why the output plugin works locally with test data from random
plugin but not up in the cloud.
Conclusion:
As-is, the plugin is not compatible with current state of fluent-bit containers, either in this repo or in the newer, soon to be used container-fluent-bit repo. New Issue here.
I recommend we wait until we have updated the base image for the fluent-bit containers, which is currently being done. Perhaps by that time the fluent-bit template will have been updated as well.Update: the fluent-bit plugin has support for v0.12! Code here. It is still probably beneficial to ensure a more compatible base image as well.
this was a fun research spike of an issue. other issues were created and added the board.
Check current state of out_kafka plugin check issues, are any breaking or p-0’s? Make sure none of the recent changes to the fluent-bit daemonset send an unsupported encoded data.
Blocked by: https://github.com/samsung-cnct/k2-logging-fluent-bit-daemonset/issues/10