grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.87k stars 3.45k forks source link

Feedback wanted: deprecate our Loki FluentBit plugin in favor of native FluentBit output. #4648

Open owen-d opened 3 years ago

owen-d commented 3 years ago

Hello! A long time ago we wrote a plugin for ingesting logs in Loki from fluentbit. These days, there's a native output option for Loki available in fluentbit itself courtesy of @edsiper. I'm opening this issue to solicit feedback from Loki users currently using either of these in hopeful preparation for deprecating our plugin in favor of theirs.

We hope the introduction of out of order support in Loki has helped make this feasible :)

cc @cyriltovena

edsiper commented 2 years ago

Hi Loki users,

As part of the Fluent Bit team, we want to bring a first-class citizen experience with Loki and we would like to know what are the specific missing features in our built-in connector:

Since the Golang connector will be deprecated, please let us know what is needed to prioritize on our side.

thanks.

stevehipwell commented 2 years ago

@owen-d I swapped over from the native Fluentd implementation to the native Fluent Bit implementation as soon as Loki v2.4 was release and I've been very happy with both parts.

@edsiper I think there are a few outstanding Loki issues on the Fluent Bit repo that need triaging against the latest versions? Off the top of my head the following areas need looking into, but I've not seen any of them since upgrading to the latest versions:

cyriltovena commented 2 years ago

@edsiper We have out of order now available, so we should be able to change the implementation of fluentbit to send batches in parallel.

ScarletTanager commented 2 years ago

I would very much like to see this plugin continue to be supported. We prefer having a golang output plugin available, as our team is significantly deeper in Go than C skills, and in addition to supporting the code, we intend to make a couple of small, local modifications.

I understand (correct me if I'm mistaken @edsiper ) that there are a couple of areas in which the native plugin needs to be brought up to par (e.g. support for batch compression), and while those are definitely good things to have, their addition to the C plugin doesn't help our specific case.

So I would ask the Loki team to put off deprecating the golang plugin for now, if possible.

patrick-stephens commented 2 years ago

I personally agree with deprecation: there is a fair bit of confusion with mismatches in configuration across the two plugins and so people follow a blog post/etc. for the Grafana one but use the Fluent one and then get failures. There is also the duplication of effort required: implement a feature in one then in the other (maybe slightly differently) plus the fragmentation of features. Having a single official plugin is much preferable for support, documentation, development and testing.

patrick-stephens commented 2 years ago

@owen-d what was the outcome of this? Just curious if the plan is to deprecate or not - and when if so?

sbocahu commented 2 years ago

First used the fluent-bit native and the had to switch to Loki's one as we had failures to send to loki after a while (maybe after a disconnection / small network interruption)

krafcima commented 2 years ago

@edsiper Where is functionality of custom labels? In grafana/fluent-bit is in output to loki LabelMapPath available. It would be cool to use custom_label_map.json.

krafcima commented 2 years ago

@edsiper Where is functionality of custom labels? In grafana/fluent-bit is in output to loki LabelMapPath available. It would be cool to use custom_label_map.json.

So, is it possible to implement it?

edsiper commented 2 years ago

@nokute78 can you implement the LabelMapPath feature please ? , ref:

https://grafana.com/docs/loki/latest/clients/fluentbit/#labelmappath

nokute78 commented 2 years ago

@edsiper I created a patch to support label_map_path https://github.com/fluent/fluent-bit/pull/6040

edsiper commented 2 years ago

awesome! thanks @nokute78 !

krafcima commented 2 years ago

@edsiper I created a patch to support label_map_path fluent/fluent-bit#6040

Awesome! Many thanks @nokute78 @edsiper !

aleonsan commented 1 year ago

Hello! It's been almost 2 years since the creation of this issue. What is the situation now? Is there a feature roadmap to fill the gaps, if any, between grafana-loki plugin and the Fluentbit's builtin Loki output?

AFAIK, this parameters are not supported by the builtin output:

Parameter Description Default
BatchWait Time to wait before send a log batch to Loki, full or not. 1s
BatchSize Log batch size to send a log batch to Loki (unit: Bytes). 10 KiB (10 * 1024 Bytes)
Timeout Maximum time to wait for loki server to respond to a request. 10s
MinBackoff Initial backoff time between retries. 500ms
MaxBackoff Maximum backoff time between retries. 5m

And some others could be achieved using other non loki output specific FBit output parameters:

e.g. #1

Parameter Description Default
MaxRetries Maximum number of retries when sending batches. Setting it to 0 will retry indefinitely. 10

could be somehow achieved using Retry_Limit parameter

e.g. #2

Parameter Description Default
Buffer Enable buffering mechanism false
-- -- --
BufferType Specify the buffering mechanism to use (currently only dque is implemented). dque
DqueDir Path to the directory for queued logs /tmp/flb-storage/loki
DqueSegmentSize Segment size in terms of number of records per segment 500
DqueSync Whether to fsync each queue change. Specify no fsync with “normal”, and fsync with “full”. “normal”
DqueName Queue name, must be uniq per output dque

buffering could be implemented using own FBit's storage parameters and limiting the size.

Am I right? Are there any other important differences between the 2 implementations?

patrick-stephens commented 1 year ago

@aleonsan thanks for the good analysis - any chance you can raise an issue on the OSS repo to track the new features we may need? https://github.com/fluent/fluent-bit

This is a Grafana repo so I cannot comment on their roadmap but from an OSS perspective I'd really like to make sure we have feature parity and a migration approach. We regularly get issues raised due to using the Grafana docs but the OSS image, plus the current Grafana image is now based on an unsupported 1.9 version of OSS - we're up to 2.1.7 as of today with a load of new features including OTEL compliance.

ksauzz commented 1 year ago

Hello, First of all, thank you for maintaining cool OSS products. I have 2 feedback items on the migration from grafana-loki plugin to loki plugin

Compression

After the migration we observed 6-8x higher traffic on loki-gateway than before. According to my investigation, it seems like grafana-loki plugin, actually promtail client, uses application/x-protobuf with snappy compression, but loki plugin uses applicaiton/json with no compression. It would be nice if loki plugin would also support compression to reduce network traffic.

image

https://github.com/grafana/loki/blob/v2.9.0/clients/pkg/promtail/client/client.go#L442-L453 https://github.com/fluent/fluent-bit/blob/v2.1.8/plugins/out_loki/loki.c#L1566-L1569

413 Request Entity Too Large by loki-gateway

loki plugin sometimes send a large data over 1MB which is rejected by loki-gateway on default. To accept such requests, we had to change client_max_body_size of loki-gateway to 3m. According to the fluentbit's docs, a chunk size is usually about 2MB, so we choose 3MB client_max_body_size. Thus, it would be nice to set 3MB client_max_body_size to loki-gateway on default in the helm chart.

https://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size https://github.com/grafana/helm-charts/blob/loki-distributed-0.74.1/charts/loki-distributed/values.yaml#L1146

patrick-stephens commented 1 year ago

@ksauzz any feedback on the OSS side needs to be fed back to the OSS repo rather than this Grafana one otherwise it won't be seen. https://github.com/fluent/fluent-bit

Turkish commented 1 year ago

I started with native and had to switch to grafana-loki for this reason:

The native Fluent-bit loki plugin does not support a custom URI , you can only set the Host and the Port, but you have no control over the URI (the path). With grafana-loki plugin, you can set a full Url.

Now I'm struggling with grafana-loki plugin to configure tls, I don't see that it's possible in the documentation, if anyone has an idea please help

edsiper commented 1 year ago

@Turkish thanks for your feedback. I have submitted a PR to implement that feature in Fluent Bit:

https://github.com/fluent/fluent-bit/pull/8040

edsiper commented 1 year ago

hey folks, just wanted to check what else is needed to complete the transition, last two missing pieces around compression and configurable URI has been addressed. Please report any missing thing here.

ptr1120 commented 11 months ago

hey folks, just wanted to check what else is needed to complete the transition, last two missing pieces around compression and configurable URI has been addressed. Please report any missing thing here.

It would be interesting to find a solution for how to push structured metadata to Loki using the fluent-bit Loki output.

patrick-stephens commented 11 months ago

OSS Fluent Bit does include an additional optional metadata section in every record now, primarily to support some of the OTEL requirements I believe. This potentially could be used.

bgarcial commented 10 months ago

Hey guys I just saw in this website that the fluentbit grafana helm chart is deprecated now and is recommendable to use the official helm chart. Is it only for the helm chart or is the grafana fluentbit implementation also deprecated? I asked because the official documentation on grafana their fluentbit implementation is still there

edsiper commented 5 months ago

Hi Folks, regarding the initial requirements around batching, is this still highly necessary ? I would like to learn from urgency level of this