grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.54k stars 3.4k forks source link

fluent-bit collect log to loki cause fatal error: morestack on g0 #13060

Open leo198706 opened 4 months ago

leo198706 commented 4 months ago

Describe the bug I use the out-grafana-loki collect log to loki. After running for a while, an error will appear, fluent-bit will restart. This error is not very stable because only half of the pods have errors in my online grayscale test.

error.log

Here is my conf:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
  labels:
    k8s-app: fluent-bit
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     debug
        Daemon        off
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE output-pod.conf
    @INCLUDE input-systemd.conf
    @INCLUDE filter-systemd.conf

  filter-systemd.conf: |
    [FILTER]
       Name modify
       Match systemd.*
       Copy  SYSLOG_IDENTIFIER systemd

  input-systemd.conf: |
    [INPUT]
      Name              systemd
      Tag               systemd.*
      Path              /var/log/journal
      DB                /var/log/flb_kube.db
      Systemd_Filter    _SYSTEMD_UNIT=containerd.service
      Systemd_Filter    _SYSTEMD_UNIT=kubelet.service
      Mem_Buf_Limit     50MB

  output-pod.conf: |
    [Output]
      Name grafana-loki
      Match_Regex   (systemd).*
      Url http://localhost:3100/api/prom/push
      TenantID ""
      BatchWait 2s
      BatchSize 1048576
      Labels {job="fluent-bit"}
      RemoveKeys kubernetes,stream,logtag,pod_name,pod_ip,java
      AutoKubernetesLabels false
      LabelMapPath /fluent-bit/etc/labelmap.json
      LineFormat json
      LogLevel warn
      Buffer true
      DqueSegmentSize 8096
      DqueDir /tmp/flb-storage/buffer
      DqueName loki.0

  labelmap.json: |
    {
      "kubernetes": {
        "container_name": "container",
        "labels": {
          "derbysoft.com/app.name": "app",
          "derbysoft.com/app.service": "service"
        },
        "pod_name": "pod_name"
      },
      "stream": "stream",
      "systemd": "systemd",
      "java":"java",
      "pod_ip":"pod_ip",
      "pod_name":"pod_name"
    }

Here is my plugin build dockerfile:

FROM golang:1.22.2-bullseye as plugin-builder

ENV LOKI_VERSION=3.0.0
ENV GIT_HASH=b4f7181

ARG LOKI_TARBALL=https://github.com/grafana/loki/archive/v$LOKI_VERSION.tar.gz

ENV LOKI_SOURCE $LOKI_TARBALL

RUN curl -L -o "loki.tar.gz" ${LOKI_SOURCE} \
    && mkdir -p /src/loki \
    && tar zxfv loki.tar.gz -C  /src/loki --strip-components=1 \
    && cd /src/loki

WORKDIR /src/loki

RUN make fluent-bit-plugin

FROM fluent/fluent-bit:2.2.3

COPY --from=plugin-builder /src/loki/clients/cmd/fluent-bit/out_grafana_loki.so /fluent-bit/bin
COPY --from=plugin-builder /src/loki/clients/cmd/fluent-bit/fluent-bit.conf /fluent-bit/etc/fluent-bit.conf

EXPOSE 2020

CMD ["/fluent-bit/bin/fluent-bit", "-e","/fluent-bit/bin/out_grafana_loki.so", "-c", "/fluent-bit/etc/fluent-bit.conf"]

To Reproduce Use input plugin to collect logs to loki, running for serval hours.

Expected behavior fluent-bit runs normally, no error log

Environment:

leo198706 commented 1 week ago

update: I use the latest golang image and it runs fine in the test environment

FROM golang:1.23.1-bullseye as plugin-builder

ENV LOKI_VERSION=3.2.0
ENV GIT_HASH=b4f7181

ARG LOKI_TARBALL=https://github.com/grafana/loki/archive/v$LOKI_VERSION.tar.gz

ENV LOKI_SOURCE $LOKI_TARBALL

RUN curl -L -o "loki.tar.gz" ${LOKI_SOURCE} \
    && mkdir -p /src/loki \
    && tar zxfv loki.tar.gz -C  /src/loki --strip-components=1 \
    && cd /src/loki

WORKDIR /src/loki

RUN make BUILD_IN_CONTAINER=false GIT_REVISION=$GIT_HASH GIT_BRANCH=v$LOKI_VERSION IMAGE_TAG=$LOKI_VERSION fluent-bit-plugin

FROM fluent/fluent-bit:2.2.3

COPY --from=plugin-builder /src/loki/clients/cmd/fluent-bit/out_grafana_loki.so /fluent-bit/bin
COPY --from=plugin-builder /src/loki/clients/cmd/fluent-bit/fluent-bit.conf /fluent-bit/etc/fluent-bit.conf

EXPOSE 2020

CMD ["/fluent-bit/bin/fluent-bit", "-e","/fluent-bit/bin/out_grafana_loki.so", "-c", "/fluent-bit/etc/fluent-bit.conf"]