wolfpld / tracy

Frame profiler
https://tracy.nereid.pl/
Other
8.77k stars 607 forks source link

Tracy exhausts file descriptors: Too many open files #512

Open qcolombet opened 1 year ago

qcolombet commented 1 year ago

I'm running into an issue where tracy cannot be used on some machine because it exhausts the limit of file descriptors that can be kept opened at the same time. The issue stems from tracy's use of perf_event_open for PERF_TYPE_TRACEPOINT. Essentially tracy creates one such descriptor per CPU and when your machine has lots of CPUs (128 in my case, lucky me!) you quickly run out of file descriptors and your program fails anytime it tries to open a file on its own.

Is there a workaround for this?

Here is a snippet of the strace of my program:

# Event set for each CPU
# At this point of the trace, CPU id 123 and onward
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=0x /* PERF_ATTR_SIZE_??? */, config=317, ...}, -1, 123, -1, PERF_FLAG_FD_CLOEXEC) = 1022
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_SHARED, 1022, 0) = 0x
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=0x /* PERF_ATTR_SIZE_??? */, config=317, ...}, -1, 124, -1, PERF_FLAG_FD_CLOEXEC) = 1023
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_SHARED, 1023, 0) = 0x
# We run out of file descriptors
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=0x /* PERF_ATTR_SIZE_??? */, config=317, ...}, -1, 125, -1, PERF_FLAG_FD_CLOEXEC) = -1 EMFILE (Too many open files)
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=0x /* PERF_ATTR_SIZE_??? */, config=317, ...}, -1, 126, -1, PERF_FLAG_FD_CLOEXEC) = -1 EMFILE (Too many open files)
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=0x /* PERF_ATTR_SIZE_??? */, config=317, ...}, -1, 127, -1, PERF_FLAG_FD_CLOEXEC) = -1 EMFILE (Too many open files)
# We set some more events on all the CPUs, none will work at this point
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=0x /* PERF_ATTR_SIZE_??? */, config=287, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EMFILE (Too many open files)
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=0x /* PERF_ATTR_SIZE_??? */, config=287, ...}, -1, 1, -1, PERF_FLAG_FD_CLOEXEC) = -1 EMFILE (Too many open files)
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=0x /* PERF_ATTR_SIZE_??? */, config=287, ...}, -1, 2, -1, PERF_FLAG_FD_CLOEXEC) = -1 EMFILE (Too many open files)
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=0x /* PERF_ATTR_SIZE_??? */, config=287, ...}, -1, 3, -1, PERF_FLAG_FD_CLOEXEC) = -1 EMFILE (Too many open files)
...
# At this point if the program tries to open any file, it will fail.

I guess one way to reproduce would be to create a program that opens a few files after setting the tracy monitoring on a machine with a lot of cores and see it burn.

pzread commented 1 year ago

(Not a maintainer of tracy) Can you check ulimit -n on your machine? Looks like the program was only allowed to open 1024 files (default by most Linux). A workaround could be increasing the limit.

qcolombet commented 1 year ago

You're right, when I run the program with sudo, the limit is 1024. I can collect what I need if I do: sudo sh -c "ulimit -n <bigNum> && <myTracyInstrumentedProgram>"

qcolombet commented 1 year ago

Thanks @pzread for the workaround.

wolfpld commented 1 year ago

The workaround is a proper solution here.

KoolJBlack commented 7 months ago

I spent a significant amount of time tracking down this same issue from failing sudo runs with tracy in our project. I was going to file an issue then found this. I'm also running a machine with 128 CPUs in my case.

Perhaps it would help to surface this information/workaround more visibly in tracy's docs? Running with elevated permissions is explicitly recommended for tracy to have full access to all kernel facilities.