parca-dev / parca-agent

eBPF based always-on profiler auto-discovering targets in Kubernetes and systemd, zero code changes or restarts needed!
https://parca.dev/
Apache License 2.0
535 stars 66 forks source link

Failure to create BPF map / load program (resulting in segfault) #1573

Open Arau opened 1 year ago

Arau commented 1 year ago

It has the same symptom as https://github.com/parca-dev/parca-agent/issues/1543, but I think it is a different issue.

Describe the bug When running the agent, it results in a crash because of the vdso files are missing.

level=error name=parca-agent ts=2023-04-19T10:17:37.860523265Z caller=main.go:473 msg="failed to initialize vdso cache" err="failed to open elf file:/usr/lib/modules/6.1.10-x86_64-linode159/vdso/vdso.so, err:open /usr/lib/modules/6.1.10-x86_64-linode159/vdso/vdso.so: no such file or directory; failed to open elf file:/usr/lib/modules/6.1.10-x86_64-linode159/vdso/vdso64.so, err:open /usr/lib/modules/6.1.10-x86_64-linode159/vdso/vdso64.so: no such file or directory"
level=info name=parca-agent ts=2023-04-19T10:17:37.868314663Z caller=cpu.go:264 msg="Attempting to create unwind shards" count=50
level=error name=parca-agent ts=2023-04-19T10:17:37.942721677Z caller=cpu.go:288 msg="Could not create unwind info shards"
panic: runtime error: invalid memory address or nil pointer dereference
        panic: runtime error: invalid memory address or nil pointer dereference

To Reproduce Running on Ubuntu 20.04 in Linode. The kernel is built by Linode and it seems that all the modules are built in.

Software (please complete the following information):

Environment (please complete the following information):

root@rgw1:~# uname -r
6.1.10-x86_64-linode159
root@rgw1:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal

The same error occurs using both the binary and the container.

Additional context The linode base kernel has all the modules built in and the vdso files are not there. I can see them in the 5.x generic kernel space, but not in the 6.1-linode one.

root@rgw1:/usr/lib/modules# pwd
/usr/lib/modules
root@rgw1:/usr/lib/modules# ls 5.4.0-146-generic/vdso/
vdso32.so  vdso64.so  vdsox32.so
root@rgw1:/usr/lib/modules# ls 6.1.10-x86_64-linode159/
modules.alias      modules.builtin            modules.builtin.bin  modules.dep.bin  modules.order    modules.symbols
modules.alias.bin  modules.builtin.alias.bin  modules.dep          modules.devname  modules.softdep  modules.symbols.bin

Looking at the kernel config, it seems that the lib is supposed to be in, but I don't know how to manage this issue.

root@rgw1:/usr/lib/modules# zcat /proc/config.gz |grep -i vdso
# CONFIG_COMPAT_VDSO is not set
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_VDSO_TIME_NS=y
javierhonduco commented 1 year ago

I think the previous log is a red herring, the reason why this is probably panic-ing is this line:

level=error name=parca-agent ts=2023-04-19T10:17:37.942721677Z caller=cpu.go:288 msg="Could not create unwind info shards"

This is a fatal error and we will hard exit when we encounter it. It could happen for a variety of reasons, but the most typical one would be very little memory available in the host. How much memory do you have available in the system? free -h would be useful here. Are you running the Agent in any cgroups or with any memory or resource limits?

Arau commented 1 year ago

The memory usage:

~# free -h
              total        used        free      shared  buff/cache   available
Mem:          3.8Gi       1.1Gi       2.0Gi       1.0Mi       759Mi       2.5Gi

There is no oom killer appearances in the box, either.

I'm running the binary as a process, so no cgroups limiting the resources. Do you think it needs more memory?

javierhonduco commented 1 year ago

This looks reasonable, the Agent should work fine in this case. Could I ask you to run the Agent under GDB with gdb --args <full command to start Parca Agent>, run it with r and then share the stacktrace with bt?

Just to confirm, you are running the published v0.17.2 with no custom patches, right?

javierhonduco commented 1 year ago

https://github.com/parca-dev/parca-agent/pull/1620 to print the error so we can have more clarity. There will be a container image created when this commit is merged, if you can give it a try that would be great!