grafana / agent

Vendor-neutral programmable observability pipelines.
https://grafana.com/docs/agent/
Apache License 2.0
1.59k stars 485 forks source link

Grafana Agent not exporting tail logs to loki for k8s namespaces other than `monitoring` #5963

Closed rafi-gh closed 6 months ago

rafi-gh commented 8 months ago

What's wrong?

I installed grafana agent and configured it to write k8s tail logs to loki for all pods. Nonetheless, I am not observing any logs outside of the monitoring namespace and there are also some warnings being emitted by the grafana agent daemonset pods that you can read in the logs section.

The error message is http2: response body closed. I have attached the full Grafana Agent helm configuration in the section below.

Steps to reproduce

Install grafana agent and loki with the configurations below in the monitoring namespace. Then observe the logs/state of the grafana-agent daemonset.

System information

Linux 6.1.0-15-amd64

Software version

Grafana Agent v0.38.1, Loki 2.9.3

Configuration

# Grafana Agent Helm Values
# 
# 
# -- Overrides the chart's name. Used to change the infix in the resource names.
nameOverride: null

# -- Overrides the chart's computed fullname. Used to change the full prefix of
# resource names.
fullnameOverride: null

## Global properties for image pulling override the values defined under `image.registry` and `configReloader.image.registry`.
## If you want to override only one image registry, use the specific fields but if you want to override them all, use `global.image.registry`
global:
  image:
    # -- Global image registry to use if it needs to be overriden for some specific use cases (e.g local registries, custom images, ...)
    registry: ""

    # -- Optional set of global image pull secrets.
    pullSecrets: []

  # -- Security context to apply to the Grafana Agent pod.
  podSecurityContext: {}

crds:
  # -- Whether to install CRDs for monitoring.
  create: false

# Various agent settings.
agent:
  # -- Mode to run Grafana Agent in. Can be "flow" or "static".
  mode: 'flow'
  configMap:
    # -- Create a new ConfigMap for the config file.
    create: true
    # -- Content to assign to the new ConfigMap.  This is passed into `tpl` allowing for templating from values.
    content: |
      logging {
        level  = "info"
        format = "logfmt"
      }

      discovery.kubernetes "pods" {
        role = "pod"
      }

      discovery.kubernetes "nodes" {
        role = "node"
      }

      discovery.kubernetes "services" {
        role = "service"
      }

      discovery.kubernetes "endpoints" {
        role = "endpoints"
      }

      discovery.kubernetes "endpointslices" {
        role = "endpointslice"
      }

      discovery.kubernetes "ingresses" {
        role = "ingress"
      }

      loki.source.kubernetes "pods" {
        targets    = discovery.kubernetes.pods.targets
        forward_to = [loki.write.default.receiver]
      }

      loki.write "default" {
        endpoint {
          url = "http://loki-gateway/loki/api/v1/push"
        }
      } 

    # -- Name of existing ConfigMap to use. Used when create is false.
    name: null
    # -- Key in ConfigMap to get config from.
    key: null

  clustering:
    # -- Deploy agents in a cluster to allow for load distribution. Only
    # applies when agent.mode=flow.
    enabled: false

  # -- Path to where Grafana Agent stores data (for example, the Write-Ahead Log).
  # By default, data is lost between reboots.
  storagePath: /tmp/agent

  # -- Address to listen for traffic on. 0.0.0.0 exposes the UI to other
  # containers.
  listenAddr: 0.0.0.0

  # -- Port to listen for traffic on.
  listenPort: 80

  # --  Base path where the UI is exposed.
  uiPathPrefix: /

  # -- Enables sending Grafana Labs anonymous usage stats to help improve Grafana
  # Agent.
  enableReporting: true

  # -- Extra environment variables to pass to the agent container.
  extraEnv: []

  # -- Maps all the keys on a ConfigMap or Secret as environment variables. https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#envfromsource-v1-core
  envFrom: []

  # -- Extra args to pass to `agent run`: https://grafana.com/docs/agent/latest/flow/reference/cli/run/
  extraArgs: []

  # -- Extra ports to expose on the Agent
  extraPorts: []
  # - name: "faro"
  #   port: 12347
  #   targetPort: 12347
  #   protocol: "TCP"

  mounts:
    # -- Mount /var/log from the host into the container for log collection.
    varlog: false
    # -- Mount /var/lib/docker/containers from the host into the container for log
    # collection.
    dockercontainers: false

    # -- Extra volume mounts to add into the Grafana Agent container. Does not
    # affect the watch container.
    extra: []

  # -- Security context to apply to the Grafana Agent container.
  securityContext: {}

  # -- Resource requests and limits to apply to the Grafana Agent container.
  resources: {}

image:
  # -- Grafana Agent image registry (defaults to docker.io)
  registry: "docker.io"
  # -- Grafana Agent image repository.
  repository: grafana/agent
  # -- (string) Grafana Agent image tag. When empty, the Chart's appVersion is
  # used.
  tag: null
  # -- Grafana Agent image's SHA256 digest (either in format "sha256:XYZ" or "XYZ"). When set, will override `image.tag`.
  digest: null
  # -- Grafana Agent image pull policy.
  pullPolicy: IfNotPresent
  # -- Optional set of image pull secrets.
  pullSecrets: []

rbac:
  # -- Whether to create RBAC resources for the agent.
  create: true

serviceAccount:
  # -- Whether to create a service account for the Grafana Agent deployment.
  create: true
  # -- Annotations to add to the created service account.
  annotations: {}
  # -- The name of the existing service account to use when
  # serviceAccount.create is false.
  name: null

# Options for the extra controller used for config reloading.
configReloader:
  # -- Enables automatically reloading when the agent config changes.
  enabled: true
  image:
    # -- Config reloader image registry (defaults to docker.io)
    registry: "docker.io"
    # -- Repository to get config reloader image from.
    repository: jimmidyson/configmap-reload
    # -- Tag of image to use for config reloading.
    tag: v0.8.0
    # -- SHA256 digest of image to use for config reloading (either in format "sha256:XYZ" or "XYZ"). When set, will override `configReloader.image.tag`
    digest: ""
  # -- Override the args passed to the container.
  customArgs: []
  # -- Resource requests and limits to apply to the config reloader container.
  resources:
    requests:
      cpu: "1m"
      memory: "5Mi"
  # -- Security context to apply to the Grafana configReloader container.
  securityContext: {}

controller:
  # -- Type of controller to use for deploying Grafana Agent in the cluster.
  # Must be one of 'daemonset', 'deployment', or 'statefulset'.
  type: 'daemonset'

  # -- Number of pods to deploy. Ignored when controller.type is 'daemonset'.
  replicas: 1

  # -- Annotations to add to controller.
  extraAnnotations: {}

  # -- Whether to deploy pods in parallel. Only used when controller.type is
  # 'statefulset'.
  parallelRollout: true

  # -- Configures Pods to use the host network. When set to true, the ports that will be used must be specified.
  hostNetwork: false

  # -- Configures Pods to use the host PID namespace.
  hostPID: false

  # -- Configures the DNS policy for the pod. https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
  dnsPolicy: ClusterFirst

  # -- Update strategy for updating deployed Pods.
  updateStrategy: {}

  # -- nodeSelector to apply to Grafana Agent pods.
  nodeSelector: {}

  # -- Tolerations to apply to Grafana Agent pods.
  tolerations: []

  # -- priorityClassName to apply to Grafana Agent pods.
  priorityClassName: ''

  # -- Extra pod annotations to add.
  podAnnotations: {}

  # -- Extra pod labels to add.
  podLabels: {}

  # -- Whether to enable automatic deletion of stale PVCs due to a scale down operation, when controller.type is 'statefulset'.
  enableStatefulSetAutoDeletePVC: false

  autoscaling:
    # -- Creates a HorizontalPodAutoscaler for controller type deployment.
    enabled: false
    # -- The lower limit for the number of replicas to which the autoscaler can scale down.
    minReplicas: 1
    # -- The upper limit for the number of replicas to which the autoscaler can scale up.
    maxReplicas: 5
    # -- Average CPU utilization across all relevant pods, a percentage of the requested value of the resource for the pods. Setting `targetCPUUtilizationPercentage` to 0 will disable CPU scaling.
    targetCPUUtilizationPercentage: 0
    # -- Average Memory utilization across all relevant pods, a percentage of the requested value of the resource for the pods. Setting `targetMemoryUtilizationPercentage` to 0 will disable Memory scaling.
    targetMemoryUtilizationPercentage: 80

  # -- Affinity configuration for pods.
  affinity: {}

  volumes:
    # -- Extra volumes to add to the Grafana Agent pod.
    extra: []

  # -- volumeClaimTemplates to add when controller.type is 'statefulset'.
  volumeClaimTemplates: []

  ## -- Additional init containers to run.
  ## ref: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
  ##
  initContainers: []

service:
  # -- Creates a Service for the controller's pods.
  enabled: true
  # -- Service type
  type: ClusterIP
  # -- Cluster IP, can be set to None, empty "" or an IP address
  clusterIP: ''
  annotations: {}
    # cloud.google.com/load-balancer-type: Internal

serviceMonitor:
  enabled: false
  # -- Additional labels for the service monitor.
  additionalLabels: {}
  # -- Scrape interval. If not set, the Prometheus default scrape interval is used.
  interval: ""
  # -- MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
  # ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
  metricRelabelings: []
  # - action: keep
  #   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
  #   sourceLabels: [__name__]

  # -- RelabelConfigs to apply to samples before scraping
  # ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
  relabelings: []
  # - sourceLabels: [__meta_kubernetes_pod_node_name]
  #   separator: ;
  #   regex: ^(.*)$
  #   targetLabel: nodename
  #   replacement: $1
  #   action: replace

ingress:
  # -- Enables ingress for the agent (faro port)
  enabled: false
  # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
  # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
  # ingressClassName: nginx
  # Values can be templated
  annotations:
    {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  labels: {}
  path: /
  faroPort: 12347

  # pathType is only for k8s >= 1.1=
  pathType: Prefix

  hosts:
    - chart-example.local
  ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
  extraPaths: []
  # - path: /*
  #   backend:
  #     serviceName: ssl-redirect
  #     servicePort: use-annotation
  ## Or for k8s > 1.19
  # - path: /*
  #   pathType: Prefix
  #   backend:
  #     service:
  #       name: ssl-redirect
  #       port:
  #         name: use-annotation

  tls: []
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local
---
# Loki Helm Values
# 
# 
global:
  image:
    # -- Overrides the Docker registry globally for all images
    registry: null
  # -- Overrides the priorityClassName for all pods
  priorityClassName: null
  # -- configures cluster domain ("cluster.local" by default)
  clusterDomain: "cluster.local"
  # -- configures DNS service name
  dnsService: "kube-dns"
  # -- configures DNS service namespace
  dnsNamespace: "kube-system"
# -- Overrides the chart's name
nameOverride: null
# -- Overrides the chart's computed fullname
fullnameOverride: null
# -- Overrides the chart's cluster label
clusterLabelOverride: null
# -- Image pull secrets for Docker images
imagePullSecrets: []
kubectlImage:
  # -- The Docker registry
  registry: docker.io
  # -- Docker image repository
  repository: bitnami/kubectl
  # -- Overrides the image tag whose default is the chart's appVersion
  tag: null
  # -- Overrides the image tag with an image digest
  digest: null
  # -- Docker image pull policy
  pullPolicy: IfNotPresent
loki:
  # Configures the readiness probe for all of the Loki pods
  readinessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 30
    timeoutSeconds: 1
  image:
    # -- The Docker registry
    registry: docker.io
    # -- Docker image repository
    repository: grafana/loki
    # -- Overrides the image tag whose default is the chart's appVersion
    # TODO: needed for 3rd target backend functionality
    # revert to null or latest once this behavior is relased
    tag: null
    # -- Overrides the image tag with an image digest
    digest: null
    # -- Docker image pull policy
    pullPolicy: IfNotPresent
  # -- Common annotations for all deployments/StatefulSets
  annotations: {}
  # -- Common annotations for all pods
  podAnnotations: {}
  # -- Common labels for all pods
  podLabels: {}
  # -- Common annotations for all services
  serviceAnnotations: {}
  # -- Common labels for all services
  serviceLabels: {}
  # -- The number of old ReplicaSets to retain to allow rollback
  revisionHistoryLimit: 10
  # -- The SecurityContext for Loki pods
  podSecurityContext:
    fsGroup: 10001
    runAsGroup: 10001
    runAsNonRoot: true
    runAsUser: 10001
  # -- The SecurityContext for Loki containers
  containerSecurityContext:
    readOnlyRootFilesystem: true
    capabilities:
      drop:
        - ALL
    allowPrivilegeEscalation: false
  # -- Should enableServiceLinks be enabled. Default to enable
  enableServiceLinks: true
  # -- Specify an existing secret containing loki configuration. If non-empty, overrides `loki.config`
  existingSecretForConfig: ""
  # -- Defines what kind of object stores the configuration, a ConfigMap or a Secret.
  # In order to move sensitive information (such as credentials) from the ConfigMap/Secret to a more secure location (e.g. vault), it is possible to use [environment variables in the configuration](https://grafana.com/docs/loki/latest/configuration/#use-environment-variables-in-the-configuration).
  # Such environment variables can be then stored in a separate Secret and injected via the global.extraEnvFrom value. For details about environment injection from a Secret please see [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/#use-case-as-container-environment-variables).
  configStorageType: ConfigMap
  # -- Name of the Secret or ConfigMap that contains the configuration (used for naming even if config is internal).
  externalConfigSecretName: '{{ include "loki.name" . }}'
  # -- Config file contents for Loki
  # @default -- See values.yaml
  config: |
    {{- if .Values.enterprise.enabled}}
    {{- tpl .Values.enterprise.config . }}
    {{- else }}
    auth_enabled: {{ .Values.loki.auth_enabled }}
    {{- end }}

    {{- with .Values.loki.server }}
    server:
      {{- toYaml . | nindent 2}}
    {{- end}}

    memberlist:
    {{- if .Values.loki.memberlistConfig }}
      {{- toYaml .Values.loki.memberlistConfig | nindent 2 }}
    {{- else }}
    {{- if .Values.loki.extraMemberlistConfig}}
    {{- toYaml .Values.loki.extraMemberlistConfig | nindent 2}}
    {{- end }}
      join_members:
        - {{ include "loki.memberlist" . }}
        {{- with .Values.migrate.fromDistributed }}
        {{- if .enabled }}
        - {{ .memberlistService }}
        {{- end }}
        {{- end }}
    {{- end }}

    {{- with .Values.loki.ingester }}
    ingester:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- if .Values.loki.commonConfig}}
    common:
    {{- toYaml .Values.loki.commonConfig | nindent 2}}
      storage:
      {{- include "loki.commonStorageConfig" . | nindent 4}}
    {{- end}}

    {{- with .Values.loki.limits_config }}
    limits_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    runtime_config:
      file: /etc/loki/runtime-config/runtime-config.yaml

    {{- with .Values.loki.memcached.chunk_cache }}
    {{- if and .enabled (or .host .addresses) }}
    chunk_store_config:
      chunk_cache_config:
        memcached:
          batch_size: {{ .batch_size }}
          parallelism: {{ .parallelism }}
        memcached_client:
          {{- if .host }}
          host: {{ .host }}
          {{- end }}
          {{- if .addresses }}
          addresses: {{ .addresses }}
          {{- end }}
          service: {{ .service }}
    {{- end }}
    {{- end }}

    {{- if .Values.loki.schemaConfig }}
    schema_config:
    {{- toYaml .Values.loki.schemaConfig | nindent 2}}
    {{- else }}
    schema_config:
      configs:
        - from: 2022-01-11
          store: boltdb-shipper
          object_store: {{ .Values.loki.storage.type }}
          schema: v12
          index:
            prefix: loki_index_
            period: 24h
    {{- end }}

    {{ include "loki.rulerConfig" . }}

    {{- if or .Values.tableManager.retention_deletes_enabled .Values.tableManager.retention_period }}
    table_manager:
      retention_deletes_enabled: {{ .Values.tableManager.retention_deletes_enabled }}
      retention_period: {{ .Values.tableManager.retention_period }}
    {{- end }}

    {{- with .Values.loki.memcached.results_cache }}
    query_range:
      align_queries_with_step: true
      {{- if and .enabled (or .host .addresses) }}
      cache_results: {{ .enabled }}
      results_cache:
        cache:
          default_validity: {{ .default_validity }}
          memcached_client:
            {{- if .host }}
            host: {{ .host }}
            {{- end }}
            {{- if .addresses }}
            addresses: {{ .addresses }}
            {{- end }}
            service: {{ .service }}
            timeout: {{ .timeout }}
      {{- end }}
    {{- end }}

    {{- with .Values.loki.storage_config }}
    storage_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.query_scheduler }}
    query_scheduler:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.compactor }}
    compactor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.analytics }}
    analytics:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.querier }}
    querier:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.index_gateway }}
    index_gateway:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.frontend }}
    frontend:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.frontend_worker }}
    frontend_worker:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.distributor }}
    distributor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    tracing:
      enabled: {{ .Values.loki.tracing.enabled }}
  # Should authentication be enabled
  auth_enabled: false
  # -- memberlist configuration (overrides embedded default)
  memberlistConfig: {}
  # -- Extra memberlist configuration
  extraMemberlistConfig: {}
  # -- Tenants list to be created on nginx htpasswd file, with name and password keys
  tenants: []
  # -- Check https://grafana.com/docs/loki/latest/configuration/#server for more info on the server configuration.
  server:
    http_listen_port: 3100
    grpc_listen_port: 9095
  # -- Limits config
  limits_config:
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    max_cache_freshness_per_query: 10m
    split_queries_by_interval: 15m
  # -- Provides a reloadable runtime configuration file for some specific configuration
  runtimeConfig: {}
  # -- Check https://grafana.com/docs/loki/latest/configuration/#common_config for more info on how to provide a common configuration
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 1
    compactor_address: '{{ include "loki.compactorAddress" . }}'
  # -- Configure memcached as an external cache for chunk and results cache. Disabled by default
  # must enable and specify a host for each cache you would like to use.
  memcached:
    chunk_cache:
      enabled: false
      host: ""
      service: "memcached-client"
      batch_size: 256
      parallelism: 10
    results_cache:
      enabled: false
      host: ""
      service: "memcached-client"
      timeout: "500ms"
      default_validity: "12h"
  # -- Check https://grafana.com/docs/loki/latest/configuration/#schema_config for more info on how to configure schemas
  schemaConfig: {}
  # -- Check https://grafana.com/docs/loki/latest/configuration/#ruler for more info on configuring ruler
  rulerConfig: {}
  # -- Structured loki configuration, takes precedence over `loki.config`, `loki.schemaConfig`, `loki.storageConfig`
  structuredConfig: {}
  # -- Additional query scheduler config
  query_scheduler: {}
  # -- Additional storage config
  storage_config:
    hedging:
      at: "250ms"
      max_per_second: 20
      up_to: 3
  # --  Optional compactor configuration
  compactor: {}
  # --  Optional analytics configuration
  analytics: {}
  # --  Optional querier configuration
  querier: {}
  # --  Optional ingester configuration
  ingester: {}
  # --  Optional index gateway configuration
  index_gateway:
    mode: ring
  frontend:
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
  frontend_worker:
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
  # -- Optional distributor configuration
  distributor: {}
  # -- Enable tracing
  tracing:
    enabled: false
# Monitoring section determines which monitoring features to enable
monitoring:
  # Dashboards for monitoring Loki
  dashboards:
    # -- If enabled, create configmap with dashboards for monitoring Loki
    enabled: false
    # -- Alternative namespace to create dashboards ConfigMap in
    namespace: null
    # -- Additional annotations for the dashboards ConfigMap
    annotations: {}
    # -- Labels for the dashboards ConfigMap
    labels:
      grafana_dashboard: "1"
  # Recording rules for monitoring Loki, required for some dashboards
  rules:
    # -- If enabled, create PrometheusRule resource with Loki recording rules
    enabled: true
    # -- Include alerting rules
    alerting: true
    # -- Alternative namespace to create PrometheusRule resources in
    namespace: null
    # -- Additional annotations for the rules PrometheusRule resource
    annotations: {}
    # -- Additional labels for the rules PrometheusRule resource
    labels: {}
    # -- Additional labels for PrometheusRule alerts
    additionalRuleLabels: {}
    # -- Additional groups to add to the rules file
    additionalGroups: []
    # - name: additional-loki-rules
    #   rules:
    #     - record: job:loki_request_duration_seconds_bucket:sum_rate
    #       expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job)
    #     - record: job_route:loki_request_duration_seconds_bucket:sum_rate
    #       expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route)
    #     - record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
    #       expr: sum(rate(container_cpu_usage_seconds_total[1m])) by (node, namespace, pod, container)
  # ServiceMonitor configuration
  serviceMonitor:
    # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
    enabled: true
    # -- Namespace selector for ServiceMonitor resources
    namespaceSelector: {}
    # -- ServiceMonitor annotations
    annotations: {}
    # -- Additional ServiceMonitor labels
    labels: {}
    # -- ServiceMonitor scrape interval
    # Default is 15s because included recording rules use a 1m rate, and scrape interval needs to be at
    # least 1/4 rate interval.
    interval: 15s
    # -- ServiceMonitor scrape timeout in Go duration format (e.g. 15s)
    scrapeTimeout: null
    # -- ServiceMonitor relabel configs to apply to samples before scraping
    # https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#relabelconfig
    relabelings: []
    # -- ServiceMonitor metric relabel configs to apply to samples before ingestion
    # https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#endpoint
    metricRelabelings: []
    # -- ServiceMonitor will use http by default, but you can pick https as well
    scheme: http
    # -- ServiceMonitor will use these tlsConfig settings to make the health check requests
    tlsConfig: null
    # -- If defined, will create a MetricsInstance for the Grafana Agent Operator.
    metricsInstance:
      # -- If enabled, MetricsInstance resources for Grafana Agent Operator are created
      enabled: true
      # -- MetricsInstance annotations
      annotations: {}
      # -- Additional MetricsInstance labels
      labels: {}
      # -- If defined a MetricsInstance will be created to remote write metrics.
      remoteWrite: null
  # Self monitoring determines whether Loki should scrape its own logs.
  # This feature currently relies on the Grafana Agent Operator being installed,
  # which is installed by default using the grafana-agent-operator sub-chart.
  # It will create custom resources for GrafanaAgent, LogsInstance, and PodLogs to configure
  # scrape configs to scrape its own logs with the labels expected by the included dashboards.
  selfMonitoring:
    enabled: true
    # -- Tenant to use for self monitoring
    tenant:
      # -- Name of the tenant
      name: "self-monitoring"
      # -- Namespace to create additional tenant token secret in. Useful if your Grafana instance
      # is in a separate namespace. Token will still be created in the canary namespace.
      secretNamespace: "{{ .Release.Namespace }}"
    # PodLogs configuration
    podLogs:
      # -- PodLogs version
      apiVersion: monitoring.grafana.com/v1alpha1
      # -- PodLogs annotations
      annotations: {}
      # -- Additional PodLogs labels
      labels: {}
      # -- PodLogs relabel configs to apply to samples before scraping
      # https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#relabelconfig
      relabelings: []
    # Grafana Agent configuration
    grafanaAgent:
      # -- Controls whether to install the Grafana Agent Operator and its CRDs.
      # Note that helm will not install CRDs if this flag is enabled during an upgrade.
      # In that case install the CRDs manually from https://github.com/grafana/agent/tree/main/production/operator/crds
      installOperator: false
      # -- Grafana Agent annotations
      annotations: {}
      # -- Additional Grafana Agent labels
      labels: {}
      # -- Enable the config read api on port 8080 of the agent
      enableConfigReadAPI: false
      # -- The name of the PriorityClass for GrafanaAgent pods
      priorityClassName: null
      # -- Tolerations for GrafanaAgent pods
      tolerations: []
    # LogsInstance configuration
    logsInstance:
      # -- LogsInstance annotations
      annotations: {}
      # -- Additional LogsInstance labels
      labels: {}
      # -- Additional clients for remote write
      clients: null
  # The Loki canary pushes logs to and queries from this loki installation to test
  # that it's working correctly
  lokiCanary:
    enabled: true
    # -- The name of the label to look for at loki when doing the checks.
    labelname: pod
    # -- Additional annotations for the `loki-canary` Daemonset
    annotations: {}
    # -- Additional labels for each `loki-canary` pod
    podLabels: {}
    service:
      # -- Annotations for loki-canary Service
      annotations: {}
      # -- Additional labels for loki-canary Service
      labels: {}
    # -- Additional CLI arguments for the `loki-canary' command
    extraArgs: []
    # -- Environment variables to add to the canary pods
    extraEnv: []
    # -- Environment variables from secrets or configmaps to add to the canary pods
    extraEnvFrom: []
    # -- Resource requests and limits for the canary
    resources: {}
    # -- DNS config for canary pods
    dnsConfig: {}
    # -- Node selector for canary pods
    nodeSelector: {}
    # -- Tolerations for canary pods
    tolerations: []
    # -- The name of the PriorityClass for loki-canary pods
    priorityClassName: null
    # -- Image to use for loki canary
    image:
      # -- The Docker registry
      registry: docker.io
      # -- Docker image repository
      repository: grafana/loki-canary
      # -- Overrides the image tag whose default is the chart's appVersion
      tag: null
      # -- Overrides the image tag with an image digest
      digest: null
      # -- Docker image pull policy
      pullPolicy: IfNotPresent
    # -- Update strategy for the `loki-canary` Daemonset pods
    updateStrategy:
      type: RollingUpdate
      rollingUpdate:
        maxUnavailable: 1
# Configuration for the write pod(s)
write:
  # -- Number of replicas for the write
  replicas: 1
# Configuration for the read pod(s)
read:
  # -- Number of replicas for the read
  replicas: 1
# Configuration for the backend pod(s)
backend:
  # -- Number of replicas for the backend
  replicas: 1
# -------------------------------------
# Configuration for `minio` child chart
# -------------------------------------
minio:
  enabled: true
  replicas: 1
  # Minio requires 2 to 16 drives for erasure code (drivesPerNode * replicas)
  # https://docs.min.io/docs/minio-erasure-code-quickstart-guide
  # Since we only have 1 replica, that means 2 drives must be used.
  drivesPerNode: 2
  rootUser: enterprise-logs
  rootPassword: supersecret
  buckets:
    - name: chunks
      policy: none
      purge: false
    - name: ruler
      policy: none
      purge: false
    - name: admin
      policy: none
      purge: false
  persistence:
    size: 5Gi
  resources:
    requests:
      cpu: 100m
      memory: 128Mi

Logs

│ grafana-agent ts=2023-12-12T14:13:23.305252716Z level=info msg="have not seen a log line in 3x average time between lines, closing and re-opening tailer" target=monitoring/grafana-agent-4v2xf:grafana-agent component=loki.source.kubernetes.pods rolling_average=9.288625795s time_since_last=28.632474812s                                                                           │
│ grafana-agent ts=2023-12-12T14:13:23.305376111Z level=warn msg="tailer stopped; will retry" target=monitoring/grafana-agent-4v2xf:grafana-agent component=loki.source.kubernetes.pods err="http2: response body closed"                                                                                                                                                                  │
│ grafana-agent ts=2023-12-12T14:13:23.331804691Z level=info msg="opened log stream" target=monitoring/grafana-agent-4v2xf:grafana-agent component=loki.source.kubernetes.pods "start time"=2023-12-12T14:13:23.305Z                                                                                                                                                                       │
│ grafana-agent ts=2023-12-12T14:13:25.516560405Z level=info msg="have not seen a log line in 3x average time between lines, closing and re-opening tailer" target=networking/hubble-exporter:hubble-exporter component=loki.source.kubernetes.pods rolling_average=2s time_since_last=6.634156439s                                                                                        │
│ grafana-agent ts=2023-12-12T14:13:25.516769364Z level=warn msg="tailer stopped; will retry" target=networking/hubble-exporter:hubble-exporter component=loki.source.kubernetes.pods err="http2: response body closed"                                                                                                                                                                    │
│ grafana-agent ts=2023-12-12T14:13:25.540247028Z level=info msg="opened log stream" target=networking/hubble-exporter:hubble-exporter component=loki.source.kubernetes.pods "start time"=2023-12-12T14:13:25.516Z
rafi-gh commented 8 months ago

I'm not certain why but after restarting the cluster this issue resolved itself.

github-actions[bot] commented 6 months ago

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!

tpaschalis commented 6 months ago

Closing this issue as completed as per the original author's comment.