grafana / agent

Vendor-neutral programmable observability pipelines.
https://grafana.com/docs/agent/
Apache License 2.0
1.6k stars 487 forks source link

Fatal error: unexpected fault address 0x1fb2e7c: segmentation violation code=0x2 addr=0x1fb2e7c pc=0x4f6b0c #381

Closed wallrj closed 3 years ago

wallrj commented 3 years ago

Grafana Agent crashed with this traceback: panic.txt

Config:

server:
  http_listen_address: 127.0.0.1
  http_listen_port: 8081
  grpc_listen_address: 127.0.0.1
  grpc_listen_port: 8082
  log_level: info

prometheus:
  wal_directory: ${STATE_DIRECTORY}/grafana-agent-wal
  global:
    scrape_interval: 15s
  configs: null

integrations:
  node_exporter:
    enabled: true
    disable_collectors:
      - timex # See https://github.com/prometheus/node_exporter/issues/1934

  prometheus_remote_write:
  - url: https://prometheus-us-central1.grafana.net/api/prom/push
    basic_auth:

loki:
  positions:
    filename: ${STATE_DIRECTORY}/positions.yaml

  clients:
  - url: https://logs-prod-us-central1.grafana.net/api/prom/push
    basic_auth:

  scrape_configs:
  - job_name: journal
    journal:
      max_age: 12h
      labels:
        job: systemd-journal
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'
      - source_labels: ['__journal__hostname']
        target_label: 'hostname'
# grafana-agent --version
agent, version v0.11.0 (branch: HEAD, revision: 48c0762)
  build user:       runner@fv-az184-545
  build date:       2021-01-20T16:32:37Z
  go version:       go1.14.4
  platform:         linux/arm

Raspbian GNU/Linux 10 (buster) Raspberry Pi 3 Model B Plus Rev 1.3

rfratto commented 3 years ago

Hey, thanks for reporting this! I take it this is the armv7 build and not arm64?

wallrj commented 3 years ago

Yes, armv7. No problem.

# arch 
armv7l
rfratto commented 3 years ago

Thanks! Two more follow up questions:

  1. When you run the Agent, are you running it with -config.expand-env to evaluate environment variables?
  2. What is the value of $STATE_DIRECTORY? Is it pointing to tmpfs? Network mount? SD card?
wallrj commented 3 years ago

Yes, I'm using config.expand-env and the $STATE_DIRECTORY is tmpfs. This is running on a Raspberry Pi and overlay-root, which overlays a squashfs rootfs with a tmpfs.

And systemd will be creating that STATE_DIRECTORY dynamically, in the tmpfs rootfs.

# xargs -0 -n1 < /proc/$(pidof grafana-agent)/environ
LANG=en_GB.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LOGNAME=grafana-agent
USER=grafana-agent
INVOCATION_ID=e81f51bdbdc74f8fa4ca4e1483ec1196
JOURNAL_STREAM=9:13089
RUNTIME_DIRECTORY=/run/grafana-agent
STATE_DIRECTORY=/var/lib/grafana-agent
CREDENTIALS_DIRECTORY=/run/credentials/grafana-agent.service
# cat /etc/systemd/system/grafana-agent.service 
[Unit]
Description=Grafana Cloud Agent

[Service]
DynamicUser=true
SupplementaryGroups=systemd-journal
LoadCredential=agent_config:/etc/%N/agent-config.yaml
ExecStart=/usr/local/bin/grafana-agent --config.expand-env --config.file=${CREDENTIALS_DIRECTORY}/agent_config
Restart=always
RuntimeDirectory=%N
RuntimeDirectoryPreserve=restart
StateDirectory=%N
CapabilityBoundingSet=
LockPersonality=true
MemoryDenyWriteExecute=true
NoNewPrivileges=yes
PrivateTmp=yes
PrivateUsers=true
ProtectControlGroups=true
ProtectHome=yes
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectSystem=strict
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
RestrictNamespaces=true
RestrictRealtime=true
SystemCallErrorNumber=EPERM
SystemCallFilter=@system-service
SystemCallFilter=~@privileged
SystemCallFilter=~@resources
UMask=0077

[Install]
WantedBy=multi-user.target
rfratto commented 3 years ago

I've looked into this a bit and I don't have anything to share yet. It seems like it crashed when trying to write to stderr, which is extremely strange.

Have you seen this more than once?

wallrj commented 3 years ago

Thanks for looking into it.

It has only happened once since I began sending logs to Grafana Cloud on 2021-01-25. And only on one of three identical Raspberry PI devices with very similar work loads.

Maybe it was a hardware glitch.

I'll let you know if it happens again.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.