samsung-cnct / fluent-bit-kafka-output-plugin

Kafka Output Plugin for FluentBit
16 stars 11 forks source link

Double check plugin is compatible with current state of Fluent-bit Daemonset #5

Closed leahnp closed 6 years ago

leahnp commented 7 years ago

Check current state of out_kafka plugin check issues, are any breaking or p-0’s? Make sure none of the recent changes to the fluent-bit daemonset send an unsupported encoded data.


Blocked by: https://github.com/samsung-cnct/k2-logging-fluent-bit-daemonset/issues/10

guineveresaenger commented 7 years ago

After fixing an upstream error by updating the version (https://github.com/samsung-cnct/k2-logging-fluent-bit-daemonset/pull/19) logs are getting printed to stdout. If we change the output to kafka plugin, the following error occurs on the Pods:

[2017/09/19 16:13:10] [ info] [engine] started
Failed to start Sarama producer: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
[2017/09/19 16:13:11] [ info] [filter_kube] https=1 host=kubernetes.default.svc port=443
[2017/09/19 16:13:11] [ info] [filter_kube] local POD info OK
[2017/09/19 16:13:11] [ info] [filter_kube] testing connectivity with API server...
[2017/09/19 16:13:11] [ info] [filter_kube] API server connectivity OK
panic: interface conversion: interface is codec.RawExt, not uint64

goroutine 17 [running, locked to thread]:
panic(0x7f0239366ba0, 0xc820076500)
    /usr/lib/go-1.6/src/runtime/panic.go:481 +0x3ea
main.encode_as_json(0x7f023925e560, 0xc8201327e0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /fluent-bit-kafka-output-plugin/out_kafka.go:119 +0x120
main.FLBPluginFlush(0x7f0234c40010, 0xc8001f4256, 0x1e5e080, 0x7f0238b22f40)
    /fluent-bit-kafka-output-plugin/out_kafka.go:64 +0x3ac
main._cgoexpwrap_0a4fe733c09b_FLBPluginFlush(0x7f0234c40010, 0x61647075001f4256, 0x1e5e080, 0x656e69225c3d796c)
    command-line-arguments/_obj/_cgo_gotypes.go:89 +0x35
guineveresaenger commented 7 years ago

Having spent a couple days digging into this, I think I have identified a few problems.

  1. Using the daemonset with output set to out_kafka, there is a goroutine error:
    
    Failed to start Sarama producer: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
    panic: interface conversion: interface is codec.RawExt, not uint64

goroutine 17 [running, locked to thread]: panic(0x7f0ee15d7ba0, 0xc820066840) /usr/lib/go-1.6/src/runtime/panic.go:481 +0x3ea main.encode_as_json(0x7f0ee14cf560, 0xc82013ea60, 0x0, 0x0, 0x0, 0x0, 0x0) /fluent-bit-kafka-output-plugin/out_kafka.go:119 +0x120 main.FLBPluginFlush(0x7f0ed8e44010, 0xc80017d7b5, 0x1b54960, 0x7f0ee0d93f40) /fluent-bit-kafka-output-plugin/out_kafka.go:64 +0x3ac main._cgoexpwrap_0a4fe733c09b_FLBPluginFlush(0x7f0ed8e44010, 0x30755c5a0017d7b5, 0x1b54960, 0x5f726f7461727473) command-line-arguments/_obj/_cgo_gotypes.go:89 +0x35```

It seems as though there is a golang error in the output plugin that should be fixed.

This error appears regardless of whether kafka is deployed as a service on the cluster or not. Which leads me to believe that:

  1. It also appears as though the kafka service is not fully operating, even when also making a central-logging-fluentd deployment on the same cluster. There is a "pending" pod called kafka-0 and events show a scheduling error:
    
    FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason          Message
    --------- --------    -----   ----            -------------   --------    ------          -------
    4m        11s     20  default-scheduler           Warning     FailedScheduling    PersistentVolumeClaim is not bound: "datadir-kafka-0" (repeated 15 times)```

I have been looking into that error, and it appears it is related to needing a matching Volume somewhere on the cluster, and kafka fails to find that. I tried deploying a logging-central-fluentd Deployment as given here to no effect.

It is possible that I need to check if I am using the correct container images, which will be my next step.

guineveresaenger commented 7 years ago

status report: Hunting down zookeeper chart and found helm bug on versions < 1.7 Nodes were too small for kafka pods to run so needing to upgrade the cluster to bigger worker nodes. Currently working on reliably spinning up kafka pods and getting fluentbit-kafka-plugin to work.

guineveresaenger commented 7 years ago

Status report: Per inquiry on fluent slack, the basic plugin template from here only supports fluent-bit v0.11.x. It does not support v 0.12.x, which makes sense given the golang error that persists above. We want to be able to use the systemd plugin, which is new with v0.12.x. Mocking a local dev environment has proven tricky, since both the tail and systemd plugins only work on Linux. Eduardo from fluent-bit was both apologetic and helpful in suggesting a workaround to mock systemd data on my machine. Conclusion: At this point, this plugin doesn't seem compatible with our current fluent-bit daemonset (both on this repo and on the new chart repo One solution would be to rewrite the golang in the output plugin to process the incoming data appropriately.

guineveresaenger commented 7 years ago

Update: There is currently an open issue on fluent-bit plugin template to support v 0.12. Results of attempting to use kafka output plugin:

guineveresaenger commented 6 years ago

Update: the fluent-bit plugin has support for v0.12! Code here. It is still probably beneficial to ensure a more compatible base image as well.

coffeepac commented 6 years ago

this was a fun research spike of an issue. other issues were created and added the board.