Closed robertodauria closed 4 years ago
@robertodauria: It appears that fluent-bit v1.3.2 was released just a few weeks ago. Is it possible that this release (or anything in v1.3.x) resolves the issues you encountered with v1.2.x?
I've checked the latest release (v1.3.3) and the previous one, v1.3.2. Both of them segfault with a minimal configuration that didn't cause any issue on the older v1.2.x. Also, there are several open issues related to features we will certainly use:
I think we should wait another 5-6 months.
Hey guys, I came across this issue. I thought I'd humbly recommend Vector as an option as well. I think you'll find it be superior to both fluentd and fluentbit. We have extensive experience with the fluent* projects at Timber and found them to be rather unreliable in many ways. You can see our test harness here which runs a variety of performance and correctness tests across these tools.
Happy to answer any questions as well!
@binarylogic thank you for mentioning Vector as an option!
Sadly, our main reason to stick to the Fluent family is that we need support for exporting logs to Google Stackdriver. Is there any plan of adding a Stackdriver sink to Vector in future?
Hi @robertodauria, no problem. And yes, we're actually working through a GCP milestone now, https://github.com/timberio/vector/issues/572 is the specific issue for stackdriver. We should have a pull request up next week if that works for your time line.
@binarylogic That sounds great, and I'd be happy to give Vector a try. It's not too urgent for us right now as the setup we have (with fluentd) works - although it's a bit too memory-hungry - so it might take a while before we get to test it.
Hi @robertodauria
this is Eduardo, core maintainer of Fluent Bit, looking around for topics around Fluent Bit I get into this ticket.
Since you are planning to migrate to Fluent Bit and you are finding some issues I would like to point out some comments:
1 https://github.com/fluent/fluent-bit/issues/1777 : this is mostly a setup issue, our primary monitoring interface for files modifications is inotify(7), the good thing is that is highly performant but the downside is that it requires an extra file descriptor per monitored file. The workaround at the moment is to increase the Kernel limit for watched files here:
/proc/sys/fs/inotify/max_user_watches
/proc/sys/fs/inotify/max_user_instances
for environments where this modification is not an option, we will offer the backend based on stat(2), the good thing is that it doesn't require an extra file descriptor as inotify(7), the downside is that is more expensive since it involves a more expensive system call (called multiple times).
Both mechanisms already exist in Fluent Bit, the thing that at build time you have to choose one or the other, the improvement will be to let the user decide which one will use.
It the workaround above is not suitable, let me know since we are prioritizing this anyways.
2 https://github.com/fluent/fluent-bit/issues/1768 : setup issue. On high load environments, likely the ability to deliver records or events to the destination databases or cloud providers is slower than the rate of data ingestion, for hence your system faces backpressure. This is a common issue that is easily solved configuring the input plugins with a memory limit, you can read more about this here:
note: when your input is based on a network service line tcp, syslog, mqtt or other, this memory limit + backpressure might lead to discarding incoming records to survive, the workaround is to enable file system buffering mechanism so you don't lose data and you can continue processing and delivering records:
3 https://github.com/fluent/fluent-bit/issues/1755 : bug already fixed in the previous version Fluent Bit v1.3.4.
Now we are at Fluent Bit v1.3.5, I would encourage you to give it a try, if you face any issue let me know, we can follow up on our Github repo or through a call, we do that with most of the users.
Fluent Bit is deployed a few million of times every month and several companies contribute to it; if you have any question about its adoption and enterprise-grade usage I am happy to discuss about it :)
best,
@robertodauria: I think it may be time to revisit this issue. Fluent Bit is now at v1.4.5, and Vector from timber.io now seems to support exporting to Stackdriver. Either one could, at this point, be a viable option for us. What do you think?
Shiny new feature! Yup, check out our stackdriver docs: https://vector.dev/docs/reference/sinks/gcp_stackdriver_logs/
Here's a sample of all the knobs:
[sinks.my_sink_id]
# General
type = "gcp_stackdriver_logs" # required
inputs = ["my-source-id"] # required
billing_account_id = "012345-6789AB-CDEF01" # optional, no default
credentials_path = "/path/to/credentials.json" # optional, no default
folder_id = "My Folder" # optional, no default
healthcheck = true # optional, default
log_id = "vector-logs" # required
organization_id = "622418129737" # optional, no default
project_id = "vector-123456" # required
# Batch
batch.max_size = 5242880 # optional, default, bytes
batch.timeout_secs = 1 # optional, default, seconds
# Buffer
buffer.type = "memory" # optional, default
buffer.max_events = 500 # optional, default, events, relevant when type = "memory"
buffer.max_size = 104900000 # required, bytes, required when type = "disk"
buffer.when_full = "block" # optional, default
# Encoding
encoding.except_fields = ["timestamp", "message", "host"] # optional, no default
encoding.only_fields = ["timestamp", "message", "host"] # optional, no default
encoding.timestamp_format = "rfc3339" # optional, default
# Request
request.in_flight_limit = 5 # optional, default, requests
request.rate_limit_duration_secs = 1 # optional, default, seconds
request.rate_limit_num = 1000 # optional, default
request.retry_attempts = -1 # optional, default
request.retry_initial_backoff_secs = 1 # optional, default, seconds
request.retry_max_duration_secs = 10 # optional, default, seconds
request.timeout_secs = 60 # optional, default, seconds
# Resource
resource.type = "global" # required
resource.projectId = "vector-123456" # example
resource.zone = "Twilight" # example
# TLS
tls.ca_path = "/path/to/certificate_authority.crt" # optional, no default
tls.crt_path = "/path/to/host_certificate.crt" # optional, no default
tls.key_pass = "${KEY_PASS_ENV_VAR}" # optional, no default
tls.key_path = "/path/to/host_certificate.key" # optional, no default
tls.verify_certificate = true # optional, default
tls.verify_hostname = true # optional, default
You may also be interested in our upcoming Kubernetes Integration!
Let me know if I can help anyone with a test setup. :) We'd be happy to set up a chat etc.
We replaced Fluend with Vector. Closing.
Fluentd is a big and memory-hungry daemon doing much more than what our current needs are.
Fluent-bit, by the same company, is a much smaller version of it, written in C, that does just log ingestion/parsing/forwarding to Stackdriver and has native Prometheus metrics. However, while testing it, I discovered that in none of the latest versions everything we need just works out-of-the-box. In particular:
uptime
endpoint returns an empty responseI didn't test even older versions, but I don't feel too comfortable using a product that's not very mature yet. However, development is very active and it's likely fluent-bit will become a more viable solution in the next months. This issue is to remind myself to revisit it in - say - three months from now.