kube-logging / logging-operator

Logging operator for Kubernetes
https://kube-logging.dev
Apache License 2.0
1.53k stars 326 forks source link

Any new version from 4.5.1 doesn't attach extraVolumes in fluentd #1728

Closed frit0-rb closed 2 months ago

frit0-rb commented 4 months ago

Bugs should be filed for issues encountered whilst operating logging-operator. You should first attempt to resolve your issues through the community support channels, e.g. Slack, in order to rule out individual configuration errors. #logging-operator Please provide as much detail as possible.

Describe the bug: A clear and concise description of what the bug is.

I have a logging-operator deployed in a kubernetes Cluster RKE2 with version 4.5.1, when I try to update for a new version like 4.5.3 or 4.5.6 the logs stored in the fluentd never sent to Splunk

Expected behaviour: A concise description of what you expected to happen.

The logs sent to Splunk

Steps to reproduce the bug: Steps to reproduce the bug should be clear and easily reproducible to help people gain an understanding of the problem.

Additional context: Add any other context about the problem here.

Environment details:

/kind bug

frit0-rb commented 4 months ago

More info:

failed to flush the buffer. retry_times=3 next_retry_time=2024-04-26 14:57:56 +0000 chunk="61700e0f9e5790e5efb53ae6d92b1e5f" error_class=OpenSSL::SSL::SSLError error="SSL_CTX_load_verify_file: system lib"

frit0-rb commented 4 months ago

I tried to update from 4.5.2 to 4.5.6, when its done and when I see the logs in the fluentd pod I see this error:

error_class=OpenSSL::SSL::SSLError error="SSL_CTX_load_verify_file: system lib"

This is the log:

2024-04-30 09:49:49 +0000 [warn]: #0 [flow:gitlab:gitlab-to-splunk:output:gitlab:splunk-gitlab-dev] failed to flush the buffer. retry_times=9 next_retry_time=2024-04-30 09:58:20 +0000 chunk="6174d053bd6f5921236fadd5329cdb94" error_class=OpenSSL::SSL::SSLError error="SSL_CTX_load_verify_file: system lib" 2024-04-30T11:49:49.169562308+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1666:ininitialize' 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1666:in new' 2024-04-30T11:49:49.169585608+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1666:inconnect' 2024-04-30T11:49:49.169600808+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1580:in do_start' 2024-04-30T11:49:49.169608108+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1575:instart' 2024-04-30T11:49:49.169615008+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-4.0.2/lib/net/http/persistent.rb:662:in start' 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-4.0.2/lib/net/http/persistent.rb:602:inconnection_for' 2024-04-30T11:49:49.169627208+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-4.0.2/lib/net/http/persistent.rb:892:in request' 2024-04-30T11:49:49.169659408+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk_hec.rb:351:inwrite_to_splunk' 2024-04-30T11:49:49.169682307+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk.rb:103:in block in write' 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/benchmark.rb:311:inrealtime' 2024-04-30T11:49:49.169695207+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk.rb:102:in write' 2024-04-30T11:49:49.169701407+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk_hec.rb:154:inwrite' 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1225:in try_flush' 2024-04-30T11:49:49.169713607+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1538:inflush_thread_run' 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:510:in block (2 levels) in start' 2024-04-30T11:49:49.169726507+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78:inblock in thread_create'`

I checked the releases notes but any change looks affect to SSL or something else

frit0-rb commented 4 months ago

Good morning,

I got the main problem.

In the actual definition of the Logging I got a extraVolume def created to parse CAs from node host workers:

apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
  name: &logging-app-dev gitlab-logging-dev
  namespace: cattle-logging-system
spec:
  loggingRef: *logging-app-dev
  fluentbit:
    security:
      roleBasedAccessControlCreate: true
  fluentd:
    security:
      roleBasedAccessControlCreate: true
      podSecurityContext:
        runAsNonRoot: false
    scaling:
      replicas: 3
    bufferStorageVolume:
      pvc:
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi
    extraVolumes:
      - volumeName: trusted-cas-volume
        path: /home/fluent/certs
        containerName: fluentd
        volume:
          hostPath:
            path: /etc/pki/ca-trust/source/anchors
  controlNamespace: cattle-logging-system
  watchNamespaces:
    - gitlab`

But when the logging is created this extra volume never create inside fluentd pods.

With the same config in 4.5.1 the extraVolume was created well

frit0-rb commented 4 months ago

I tried to add via FluentdConfig via extraVolumes a hostPath or a Secret and I got the same problem

pepov commented 4 months ago

@frit0-rb can you please use fenced code blocks so that we can see whitespaces as well?

frit0-rb commented 4 months ago

@frit0-rb can you please use fenced code blocks so that we can see whitespaces as well?

Sorry @pepov , I added the fenced

pepov commented 3 months ago

thx, I've started to look into this, but I have some conflicting priorities, I have to ask for your patience

frit0-rb commented 3 months ago

thx, I've started to look into this, but I have some conflicting priorities, I have to ask for your patience

No problem @pepov , we are not fare away from the las stable update, so take it easy

frit0-rb commented 3 months ago

Hello @pepov the a new CVE from fluentbit https://thehackernews.com/2024/05/linguistic-lumberjack-vulnerability.html So I need to resolve the problem as soon as possible because I need to update to 4.6.0

pepov commented 3 months ago

You can use the latest fluentbit anytime without upgrading logging operator by setting the fluentbit image version explocitly

mgalesloot commented 2 months ago

Looking at the code of statefulset.go it seems both Volume and PersistentVolumeClaim must be specified. Not sure why. Also, support for mounting secrets or configmaps is not supported at all. See https://github.com/kube-logging/logging-operator/blob/61e6eb05c56c393cd929d96e66e4c39f346c4882/pkg/resources/fluentd/statefulset.go#L53 These lines were changed 5 months ago.

pepov commented 2 months ago

thx @mgalesloot ! can someone help me verify this fixes the issue? https://github.com/kube-logging/logging-operator/pull/1765

Also this one from @nak0f (coming soon) will extend the support for configmaps: https://github.com/cisco-open/operator-tools/pull/251

pepov commented 2 months ago

fyi I've updated the above PR with a sample that seems to fix this issue as I would expect

frit0-rb commented 2 months ago

Hi @pepov closed this issue means the issue is solved for what version? What version I need to update to use extravolumes?

pepov commented 2 months ago

In the next upcoming version which is going to be 4.8