kube-logging / fluentd-images

Custom-built Fluentd images for the Logging operator
Apache License 2.0
2 stars 13 forks source link

broken elasticsearch output on v1.15.3 #28

Closed genofire closed 1 year ago

genofire commented 1 year ago
2023-03-30 15:09:51 +0000 [info]: starting fluentd-1.15.3 as dry run mode ruby="3.1.3"                                                                                                     
/usr/lib/ruby/3.1.0/rubygems/specification.rb:2288:in `raise_if_conflicts': Unable to activate fluent-plugin-elasticsearch-5.2.5, because faraday-2.7.4 conflicts with faraday (~> 1.10) (G
em::ConflictError) 
pepov commented 1 year ago

which image was this exactly?

sebastiangaiser commented 1 year ago

Should be the same as https://github.com/kube-logging/logging-operator/issues/1251 and https://github.com/kube-logging/fluentd-images/issues/22

pepov commented 1 year ago

yeah but that is already resolved, so this should be as well

genofire commented 1 year ago

still in v1.15.3-build.67 - please open again

pepov commented 1 year ago

with the latest logging operator chart and the following yaml

apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
  name: logging
  namespace: logging
spec:
  fluentd:
    image:
      tag: v1.15.3-build.70
  fluentbit: {}
  controlNamespace: logging
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
  name: opensearch
  namespace: logging
spec:
  match:
    - select: {}
  globalOutputRefs:
    - opensearch
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
  name: opensearch
  namespace: logging
spec:
  opensearch:
    buffer:
      flush_interval: 10s
      flush_mode: interval
    hosts: hosts
    include_timestamp: true
    log_os_400_reason: true
    logstash_dateformat: '%Y%m%d'
    logstash_format: true
    logstash_prefix: some-prefix
    password:
      value: asd
    port: 443
    reload_connections: false
    request_timeout: 20s
    scheme: https
    suppress_type_name: true
    user: user

I see this:

kubectl exec -ti logging-fluentd-0 -- cat /fluentd/log/out
Defaulted container "fluentd" out of: fluentd, config-reloader
# Logfile created on 2023-05-18 13:08:13 +0000 by logger.rb/v1.4.2
2023-05-18 13:08:13 +0000 [info]: init supervisor logger path="/fluentd/log/out" rotate_age=10 rotate_size=10485760
2023-05-18 13:08:13 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.4.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-aws-elasticsearch-service' version '2.4.1'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-azure-storage-append-blob' version '0.2.1'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-cloudwatch-logs' version '0.14.3'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-concat' version '2.5.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-datadog' version '0.14.2'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-dedot_filter' version '1.0.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.14'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.3.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-enhance-k8s-metadata' version '2.0.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-gcs' version '0.4.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-gelf-hs' version '1.0.8'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-geoip' version '1.3.2'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-grafana-loki' version '1.2.20'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-grok-parser' version '2.6.2'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-kafka' version '0.19.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-kinesis' version '3.4.2'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-kube-events-timestamp' version '0.1.3'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-kubernetes-metadata-filter' version '2.5.3'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-kubernetes-sumologic' version '2.0.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-label-router' version '0.2.10'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-logdna' version '0.4.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-logzio' version '0.0.21'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-mattermost' version '0.2.2'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-mysqlslowquery' version '0.0.9'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-newrelic' version '1.2.2'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.1.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-oss' version '0.0.2'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-parser-logfmt' version '0.0.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.3'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-redis' version '0.3.5'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-remote-syslog' version '1.1'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-s3' version '1.7.2'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-splunk-hec' version '1.3.2'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-sqs' version '3.0.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-sumologic_output' version '1.8.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-syslog_rfc5424' version '0.9.0.rc.8'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-tag-normaliser' version '0.1.2'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-throttle' version '0.0.5'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-vmware-log-intelligence' version '2.0.6'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-vmware-loginsight' version '1.4.1'
2023-05-18 13:08:13 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0'
2023-05-18 13:08:13 +0000 [info]: gem 'fluentd' version '1.15.3'
2023-05-18 13:08:14 +0000 [info]: using configuration file: <ROOT>
  <system>
    rpc_endpoint "127.0.0.1:24444"
    log_level info
    workers 1
  </system>
  <source>
    @type forward
    @id main_forward
    bind "0.0.0.0"
    port 24240
  </source>
  <match **>
    @type label_router
    @id main
    metrics false
    <route>
      @label "@2e33d43bf04b0c96c1b4b8afce91de0b"
      metrics_labels {"id":"clusterflow:logging:opensearch"}
      <match>
        negate false
      </match>
    </route>
  </match>
  <label @2e33d43bf04b0c96c1b4b8afce91de0b>
    <match **>
      @type opensearch
      @id clusterflow:logging:opensearch:clusteroutput:logging:opensearch
      catch_transport_exception_on_retry true
      emit_error_label_event true
      exception_backup true
      fail_on_detecting_os_version_retry_exceed true
      fail_on_putting_template_retry_exceed true
      hosts "hosts"
      http_backend_excon_nonblock true
      include_timestamp true
      log_os_400_reason true
      logstash_dateformat "%Y%m%d"
      logstash_format true
      logstash_prefix "some-prefix"
      password xxxxxx
      port 443
      reload_connections false
      request_timeout 20s
      scheme https
      ssl_verify true
      suppress_type_name true
      use_legacy_template true
      user "user"
      utc_index true
      verify_os_version_at_startup true
      <buffer tag,time>
        @type "file"
        chunk_limit_size 8MB
        flush_interval 10s
        flush_mode interval
        path "/buffers/clusterflow:logging:opensearch:clusteroutput:logging:opensearch.*.buffer"
        retry_forever true
        timekey 10m
        timekey_wait 1m
      </buffer>
    </match>
  </label>
  <label @ERROR>
    <match **>
      @type null
      @id main-fluentd-error
    </match>
  </label>
  <match **>
    @type null
    @id main-no-output
  </match>
  <label @FLUENT_LOG>
    <match fluent.*>
      @type null
      @id main-fluentd-log
    </match>
  </label>
</ROOT>
2023-05-18 13:08:14 +0000 [info]: starting fluentd-1.15.3 pid=7 ruby="2.7.8"
2023-05-18 13:08:14 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-o", "/fluentd/log/out", "--log-rotate-age", "10", "--log-rotate-size", "10485760", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"]
2023-05-18 13:08:14 +0000 [info]: init supervisor logger path="/fluentd/log/out" rotate_age=10 rotate_size=10485760
2023-05-18 13:08:14 +0000 [info]: #0 init worker0 logger path="/fluentd/log/out" rotate_age=10 rotate_size=10485760
2023-05-18 13:08:14 +0000 [info]: adding match in @2e33d43bf04b0c96c1b4b8afce91de0b pattern="**" type="opensearch"
2023-05-18 13:08:18 +0000 [warn]: #0 [clusterflow:logging:opensearch:clusteroutput:logging:opensearch] Could not communicate to OpenSearch, resetting connection and trying again. no address for hosts (Resolv::ResolvError)
2023-05-18 13:08:18 +0000 [warn]: #0 [clusterflow:logging:opensearch:clusteroutput:logging:opensearch] Remaining retry: 14. Retry to communicate after 2 second(s).
2023-05-18 13:08:22 +0000 [warn]: #0 [clusterflow:logging:opensearch:clusteroutput:logging:opensearch] Could not communicate to OpenSearch, resetting connection and trying again. no address for hosts (Resolv::ResolvError)
2023-05-18 13:08:22 +0000 [warn]: #0 [clusterflow:logging:opensearch:clusteroutput:logging:opensearch] Remaining retry: 13. Retry to communicate after 4 second(s).
2023-05-18 13:08:30 +0000 [warn]: #0 [clusterflow:logging:opensearch:clusteroutput:logging:opensearch] Could not communicate to OpenSearch, resetting connection and trying again. no address for hosts (Resolv::ResolvError)
2023-05-18 13:08:30 +0000 [warn]: #0 [clusterflow:logging:opensearch:clusteroutput:logging:opensearch] Remaining retry: 12. Retry to communicate after 8 second(s).
2023-05-18 13:08:46 +0000 [warn]: #0 [clusterflow:logging:opensearch:clusteroutput:logging:opensearch] Could not communicate to OpenSearch, resetting connection and trying again. no address for hosts (Resolv::ResolvError)
2023-05-18 13:08:46 +0000 [warn]: #0 [clusterflow:logging:opensearch:clusteroutput:logging:opensearch] Remaining retry: 11. Retry to communicate after 16 second(s).
2023-05-18 13:09:18 +0000 [warn]: #0 [clusterflow:logging:opensearch:clusteroutput:logging:opensearch] Could not communicate to OpenSearch, resetting connection and trying again. no address for hosts (Resolv::ResolvError)
2023-05-18 13:09:18 +0000 [warn]: #0 [clusterflow:logging:opensearch:clusteroutput:logging:opensearch] Remaining retry: 10. Retry to communicate after 32 second(s).

@genofire can you give me a reproducible example?

pepov commented 1 year ago

I also tried with elasticsearch plugin and gave similar result. However the issue seems to be apparent in the v1.15-staging image, I'm checking that

pepov commented 1 year ago

upgrading fluent-plugin-elasticsearch to 5.3.0 (from 5.2.0) seems to fix it for me for the staging image as well: https://github.com/kube-logging/fluentd-images/pull/40

pepov commented 1 year ago

looking at the original comment, it cannot be the v1.15 image as it still uses ruby 2.7, something might be off with the builds

2023-03-30 15:09:51 +0000 [info]: starting fluentd-1.15.3 as dry run mode ruby="3.1.3"

pepov commented 1 year ago

it was an error indeed, just wasn't apparent until we introduced v1.15-staging along v1.15, which both produced the same image tags 🤦

https://github.com/kube-logging/fluentd-images/pull/41

pepov commented 1 year ago

builds are now fixed and the elasticsearch plugin upgrade seems to fix the issue reported here.

the last build is stull running, please use build 86 or above.

let me know if you still have the issue with these images