[Rancher2] Logging v2 ClusterFlow, ClusterOutput, Flow, Output Examples and Best Practices

oxr463 commented 3 years ago

Request Summary:

There are several tweaks that need to be applied to get Logging v2 to work properly for most of our customers that are migration from Logging v1. I would like to get some of these into the docs on this page:

https://rancher.com/docs/rancher/v2.5/en/logging

Details:

Here is an example Output for Elastic that has been battle-tested and proven by the support team (Credit: @dbason):

apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
  name: efk
  namespace: cattle-logging-system
spec:
  elasticsearch:
    buffer:
      flush_interval: 30s
      flush_mode: interval
      flush_thread_count: 4
      queued_chunks_limit_size: 300
      type: file
    flatten_hashes: true
    host: elastic-test.rancher.local
    include_timestamp: true
    index_name: test
    log_es_400_reason: true
    password:
      valueFrom:
        secretKeyRef:
          key: elastic
          name: lramage-test-es-elastic-user
    port: 443
    reconnect_on_error: true
    scheme: https
    suppress_type_name: true
    user: elastic
    with_transporter_log: true

Under the troubleshooting section (https://rancher.com/docs/rancher/v2.5/en/logging/#troubleshooting), we should include this command:

kubectl exec -n cattle-logging-system rancher-logging-fluentd-0 -- cat /fluentd/log/out

Those logs aren't captured by our log collector, or even by whatever Output has been chosen for logs to be sent to, e.g., Elasticsearch, Splunk, etc.

Here are some issues and their workarounds:

Fluentd process dies inside the workload:

[2020/10/31 05:52:51] [error] [io] connection   rancher/docs#272 failed to: rancher-logging-fluentd.cattle-logging-system.svc:24240
[2020/10/31 05:52:51] [error] [output:forward:forward.0] no upstream connections available

The solution is to upgrade the deployment:

resources:
  limits:
    cpu: "2"
    memory: 2Gi
  requests:
    cpu: "1"
    memory: 1Gi
scaling:
  replicas: 2

Source: https://github.com/rancher/rancher/issues/29879#issuecomment-731732589

Fluentbit cannot write forward header:

[2021/06/24 19:47:15] [error] [output:forward:forward.0] could not write forward header
[2021/06/24 19:47:15] [ warn] [engine] failed to flush chunk '1-1624564034.224231239.flb', retry in 9 seconds: task_id=0, input=tail.0 > output=forward.0
[2021/06/24 19:47:24] [ info] [engine] flush chunk '1-1624564034.224231239.flb' succeeded at retry 1: task_id=1, input=tail.0 > output=forward.0

The solution is to increase the Buffer:

[INPUT]
    Name              forward
    Listen            0.0.0.0
    Port              24224
    Buffer_Chunk_Size 1MB
    Buffer_Max_Size   5MB

Source: https://github.com/fluent/fluent-bit/issues/2972#issuecomment-767753098

oxr463 commented 3 years ago

papanito commented 3 years ago

Where exactly do you configure the buffer part?

papanito commented 3 years ago

Never mind, I found it in the chart

fluentbit:
  inputTail:
    Buffer_Chunk_Size: ""
    Buffer_Max_Size: ""
    Mem_Buf_Limit: ""
    Multiline_Flush: ""
    Skip_Long_Lines: ""

Tejeev commented 2 years ago

This comes up a lot.

github-actions[bot] commented 1 year ago

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 90 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 30 days. Thank you for your contributions.

rancher / rancher-docs

[Rancher2] Logging v2 ClusterFlow, ClusterOutput, Flow, Output Examples and Best Practices #90