libbpf / blazesym

blazesym is a library for address symbolization and related tasks
BSD 3-Clause "New" or "Revised" License
104 stars 24 forks source link

Reduce number of (failed) file lookups #511

Closed r1viollet closed 7 months ago

r1viollet commented 7 months ago

Description

When running blazesym as a symbolization library, I noticed that we have repeated attempts of opening non existing files. Here is an extract of some of the (sorted) strace logs, while running for a little amount of time.

...
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 448580] openat(AT_FDCWD, "/usr/lib/debug/usr/share/spotify/libcef.so", O_RDONLY) = -1 ENOENT (No such file or directory)
...

Motivation

Reduce symbolization cost. Most of the cost of symbolizing is related to the repeated file lookups. image

r1viollet commented 7 months ago

And thanks for addressing my previous issue! Things look much better.

d-e-s-o commented 7 months ago

Thanks for the report! Interesting. Are you using process symbolization? The stack trace you show seems to indicate that you are symbolizing using the ELF source. At this point we don't attempt to open split debug information or the like, so I kind of question that the library would open something like libcef.so.debug unless explicitly asked for (say, using the ELF source and providing the path to that file).

We do funnel pretty much everything through a "file cache", which basically opens a file to check whether we are still working with an up-to-date copy (we could just stat it, but that would have TOCTOU issues). If not, we want to reload contents. I suppose that some may have no interest in this logic and we could make it optional. That would be a legitimate source of open calls.

That being said, it's entirely possible we have some duplicate opens that could be avoided. I found one already, but it's in the process symbolization logic.

r1viollet commented 7 months ago

Sorry for not being very precise in my initial report. Here is more context. I am trying to benchmark blazesym as a symbolization library for a profiling use case. I use the elf symbolizer code path. My assumption is indeed that the file is not being changed under my feet. I have recurring calls to the symbolization library at every sample (10 ms per CPU). I could move these calls at larger intervals, however we then have a risk of loosing access to temporary files.

Even in the successful code paths, I indeed have a lot of syscalls to openat. edit: Here are some logs from strace (with a sort | uniq -c), where I profile a single toy application. So we have thousands of calls to openat.

    427 [pid 1328875] openat(AT_FDCWD, "/proc/1328872/root/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", O_RDONLY|O_CLOEXEC) = 45
   2594 [pid 1328875] openat(AT_FDCWD, "/proc/1328872/root/usr/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 45
   4671 [pid 1328875] openat(AT_FDCWD, "/proc/1328872/root/usr/local/bin/BadBoggleSolver_run", O_RDONLY|O_CLOEXEC) = 45

Happy to re-evaluate my usage of the library.

d-e-s-o commented 7 months ago

Just to clarify, the paths that you see being opened are the files that you pass in as the ELF source, is that correct? The problem is just that they are repeatedly opened? Tomorrow I can add an option to circumvent the file cache and then you can re-check.

r1viollet commented 7 months ago

That is correct. I am seeing repeated opens on the same files (which are given as the elf source). Happy to try things out :+1:

I can understand the need for an option. Some folks might want there to be always a check on whether the cache is still valid.

r1viollet commented 7 months ago

Thank you for the proposal. This solves my use case.