istio / istio

Connect, secure, control, and observe services.
https://istio.io
Apache License 2.0
35.88k stars 7.74k forks source link

When first connection to fluentd fails, fluentd adaptor is disabled. #6699

Closed bgokden closed 5 years ago

bgokden commented 6 years ago

Describe the bug When first connection to fluend fails, fluentd adaptor is disabled. Create a fluend adapter while fluentd service is not available. We observed this problem when fluentd is restarted or not ready yet.

In our environment, we have a mapping creation at start up, it requires to fluentd connect to Elasticsearch. Elasticsearch starts slowly and fluentd elasticsearch pligin restarts fluentd if connection to Elasticsearch fails. Mixer fluentd adapter disables itself if the fluent is not ready. This happens sporadically.

Expected behavior Adapter retires to connect to fluentd service.

Steps to reproduce the bug Create a fluentd adapter while fluentd is not available. Follow the demo of fluentd logging https://istio.io/docs/tasks/telemetry/fluentd/ Kill or don't start fluentd.

Version kubernetes: v1.9.7-gke.3 istio: 0.8

Is Istio Auth enabled or not? No

Environment GCP

bgokden commented 6 years ago

Our proposed solution is to change the line at https://github.com/istio/istio/blob/c0f1ea7471bbef8f9912e7cf54d2ffdb8fd452de/mixer/adapter/fluentd/fluentd.go#L78

To: han.logger, err = fluent.New(fluent.Config{FluentPort: p, FluentHost: h, Async: true})

It will enable async client, and fluentd client will handle the connection failures.

sdake commented 6 years ago

@douglas-reid looks like a simple change, can you take a look?

douglas-reid commented 6 years ago

assigning to @jeffmendoza who wrote the fluentd adapter.

jeffmendoza commented 6 years ago

Sounds great. We are vendored to v1.3.0, the latest release

https://github.com/istio/istio/blob/master/Gopkg.toml#L116 https://github.com/fluent/fluent-logger-golang/releases

I'm not sure this change is included in the latest release, it might be more recent. CC @tagomoris @alexlry if they can provide any insight. Thanks!

alexlry commented 6 years ago

Hi guys, the "Async: true" was not released in v.1.3.0 and is only available in master. You can instead use "AsyncConnect: true" with v1.3.0 and it should work just fine.

However, please note v1.3.0 can panic if your fluent service is down for a long time. This is solved with latest changes from PR fluent/fluent-logger-golang#56 along with performance improvements of asynchronous mode. I don't know what are @tagomoris plans regarding new release but I'm using fluent/fluent-logger-golang master on production environment with 1000+ logs / seconds with no problem.

jeffmendoza commented 6 years ago

@bgokden See comment above, we can update Gopkg.toml to a recent SHA including the latest behaviour and change to Async: true. Are you interested in submitting this PR?

bgokden commented 6 years ago

@jeffmendoza yes, I want to submit at as pr.

jeffmendoza commented 6 years ago

Sounds great!

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it has not had activity in the last month and a half. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.