lxc / lxcfs

FUSE filesystem for LXC
https://linuxcontainers.org/lxcfs
Other
1.02k stars 246 forks source link

lxcfs: handle NULL in lxcfs_read (segfault at 0, code=killed, status=11/SEGV) #635

Open 81981266 opened 2 months ago

81981266 commented 2 months ago

os: ubuntu 5.15.0-52 lxcfs version: 4.0.11

lxcfs is killed by 11/SEGV signal, the syslog is as below:

Apr 18 22:54:43 10-169-58-129 kernel: [10829437.241075] lxcfs[197879]: segfault at 0 ip 00007f101184cf81 sp 00007f0fe9ffa790 error 6 in libc-2.31.so[7f10117e3000+178000]
Apr 18 22:54:43 10-169-58-129 systemd[1]: lxcfs.service: Main process exited, code=killed, status=11/SEGV
Apr 18 22:54:43 10-169-58-129 systemd[1]: var-lib-lxcfs.mount: Succeeded.
Apr 18 22:54:43 10-169-58-129 systemd[1]: lxcfs.service: Failed with result 'signal'.
Apr 18 22:54:43 10-169-58-129 systemd[1]: lxcfs.service: Consumed 2d 41min 37.091s CPU time.
Apr 18 22:54:43 10-169-58-129 systemd[1]: lxcfs.service: Scheduled restart job, restart counter is at 1.
Apr 18 22:54:43 10-169-58-129 systemd[1]: lxcfs.service: Consumed 2d 41min 37.091s CPU time.

the core dump explained by gbd from /var/crash folder is as below:

SeaTalk_IMG_20240424_171648

The code of lxcfs.c:778 is here: https://github.com/lxc/lxcfs/blob/lxcfs-4.0.11/src/lxcfs.c#L778

A similar issue about 'NULL path in lxcfs_releasedir/lxcfs_release' fix: https://github.com/lxc/lxcfs/pull/577

81981266 commented 2 months ago

@deleriux @mihalicyn @brauner Could you help take a look? Thank you in advance.

anooprac commented 2 months ago

Hello, my partner and I are UT Austin students and would like to work on this problem for a class project. Could we know more about how this issue can be recreated so we can try debugging it?

DevonSchwartz commented 2 months ago

I am working with anooprac I think that the solution could be to use the macros created for lxcfs_release() instead of the strcmp method.

81981266 commented 2 months ago

Hi @anooprac , it only happened on 2 nodes within a cluster with thousands of machines. I still cannot reproduce it from my side manually.

mihalicyn commented 2 months ago

Hi @81981266

thanks for your report!

This is a very interesting issues, because as I can see from callstack proc_read was called from do_sys_read which should never happen. And we obviously don't have such a calls in the LXCFS code.

My theory is that it can be a very tricky bug in dynamic symbol resolution (the dlsym function). It can be racy and return a wrong pointer to the function in some circumstances.

Another good question is that even if proc_read was called instead of sys_read for sysfs file then how this code path reached proc_loadavg_read. We do have fi->type checks everywhere.

Can you provide me with your crash dump file and your LXCFS binary so I can go through the crash-dump and analyze it?

81981266 commented 2 months ago

Hello @mihalicyn , thanks for your comment. I'm sorry to say that the core dump file was rotated/deleted when I tried to copy it just now. Then I attached the lxcfs binary. I also attached my lxcfs code repo because we did some revamp based on v4.0.11 to meet some internal requirements. Hope this can help you well. Thank you very much.

OS VERSION="20.04.6 LTS (Focal Fossa)"

lxcfs binary.zip lxcfs_v4.0.11_revamp.zip

mihalicyn commented 2 months ago

I'm sorry to say that the core dump file was rotated/deleted when I tried to copy it just now

It is sad news.

Then I attached the lxcfs binary. I also attached my lxcfs code repo because we did some revamp based on v4.0.11 to meet some internal requirements. Hope this can help you well. Thank you very much.

In general I can't see any issues with your version of the code.

Let's then wait for the next crash reproduction and crash-dump file. Also, I would strongly recommend updating to the recent LXCFS version from 4.0.11.

I mark this issue as "incomplete" as we don't have enough information to debug this right now.

DevonSchwartz commented 2 months ago

Is it still useful to update the repo to use the macros instead of strcmp for the paths?

mihalicyn commented 2 months ago

Is it still useful to update the repo to use the macros instead of strcmp for the paths?

I think it is. It also makes sense to do this for all fuse callbacks (except open/opendir of course), not only the "read" one. But let's start from read.

DevonSchwartz commented 2 months ago

Is it still useful to update the repo to use the macros instead of strcmp for the paths?

I think it is. It also makes sense to do this for all fuse callbacks (except open/opendir of course), not only the "read" one. But let's start from read.

Sounds good. I'll add the fixes for the rest of the FUSE callbacks once the lxcfs_read() changes pass.