simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.08k stars 185 forks source link

Add google perftools libprofiler to build #295

Closed jonstewart closed 2 years ago

jonstewart commented 2 years ago

libprofiler can be linked with an executable to provide for source-level profiling data. It does not require changing any code. Profiling runs are enabled by setting an environment variable (the library uses a static initializer to check it) and there’s no overhead when disabled. Profiling runs are still pretty fast because the lib samples the call stack and hardware performance counters every 10ms (the sampling period is configurable).

I will submit a PR soon. It’s just changes to configure.ac and to add the library if present. If not, the build will be normal.

I’ve gotten the profiler to work on macOS previously with lightgrep, but had some wonky results that make me think it’s not quite accurate. So this may only be useful on Linux. Still, a low effort way to gain good insight into full runs. We may find some cheap wins due to unnecessary mallocs/copies, or better understand any locking problems.

simsong commented 2 years ago

I'm okay with that, but have you seen the profiling information that we get for free in the DFXML files?

jonstewart commented 2 years ago

Yes, the scan times. Those are nice.

I'm closing this because I've learned it's superfluous. You can use Google's libprofiler without changing anything about the build (though obviously you'll want to build with symbols). Here's how:

$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so CPUPROFILE=prof.out src/bulk_extractor -Z -o ubnist1 ~/Downloads/ubnist1.gen1.E01

Setting LD_PRELOAD ensures libprofiler is loaded, and then CPUPROFILE turns on a profiling run. Even when I was telling autoconf to link against libprofiler, it wasn't happening because bulk_extractor didn't use any of its symbols.

When I can get some builds to run without deadlock, I will send you some profiling runs against some of your digitalcorpora images.