Closed 81981266 closed 1 month ago
@deleriux @mihalicyn @brauner Could you help take a look? Thank you in advance.
Hello, my partner and I are UT Austin students and would like to work on this problem for a class project. Could we know more about how this issue can be recreated so we can try debugging it?
I am working with anooprac I think that the solution could be to use the macros created for lxcfs_release() instead of the strcmp method.
Hi @anooprac , it only happened on 2 nodes within a cluster with thousands of machines. I still cannot reproduce it from my side manually.
Hi @81981266
thanks for your report!
This is a very interesting issues, because as I can see from callstack proc_read
was called from do_sys_read
which should never happen. And we obviously don't have such a calls in the LXCFS code.
My theory is that it can be a very tricky bug in dynamic symbol resolution (the dlsym
function). It can be racy and return a wrong pointer to the function in some circumstances.
Another good question is that even if proc_read
was called instead of sys_read
for sysfs file then how this code path reached proc_loadavg_read
. We do have fi->type
checks everywhere.
Can you provide me with your crash dump file and your LXCFS binary so I can go through the crash-dump and analyze it?
Hello @mihalicyn , thanks for your comment. I'm sorry to say that the core dump file was rotated/deleted when I tried to copy it just now. Then I attached the lxcfs binary. I also attached my lxcfs code repo because we did some revamp based on v4.0.11 to meet some internal requirements. Hope this can help you well. Thank you very much.
OS VERSION="20.04.6 LTS (Focal Fossa)"
I'm sorry to say that the core dump file was rotated/deleted when I tried to copy it just now
It is sad news.
Then I attached the lxcfs binary. I also attached my lxcfs code repo because we did some revamp based on v4.0.11 to meet some internal requirements. Hope this can help you well. Thank you very much.
In general I can't see any issues with your version of the code.
Let's then wait for the next crash reproduction and crash-dump file. Also, I would strongly recommend updating to the recent LXCFS version from 4.0.11.
I mark this issue as "incomplete" as we don't have enough information to debug this right now.
Is it still useful to update the repo to use the macros instead of strcmp for the paths?
Is it still useful to update the repo to use the macros instead of strcmp for the paths?
I think it is. It also makes sense to do this for all fuse callbacks (except open
/opendir
of course), not only the "read" one. But let's start from read
.
Is it still useful to update the repo to use the macros instead of strcmp for the paths?
I think it is. It also makes sense to do this for all fuse callbacks (except
open
/opendir
of course), not only the "read" one. But let's start fromread
.
Sounds good. I'll add the fixes for the rest of the FUSE callbacks once the lxcfs_read() changes pass.
os:
ubuntu 5.15.0-52
lxcfs version:4.0.11
lxcfs is killed by
11/SEGV
signal, thesyslog
is as below:the core dump explained by
gbd
from/var/crash
folder is as below:The code of
lxcfs.c:778
is here: https://github.com/lxc/lxcfs/blob/lxcfs-4.0.11/src/lxcfs.c#L778A similar issue about 'NULL path in lxcfs_releasedir/lxcfs_release' fix: https://github.com/lxc/lxcfs/pull/577