zyedidia / perforator

Record "perf" performance metrics for individual functions/regions of an ELF binary.
MIT License
69 stars 5 forks source link

Support custom range inner delimiter & demangled function names #11

Closed mkornaukhov03 closed 8 months ago

mkornaukhov03 commented 8 months ago

It closes https://github.com/zyedidia/perforator/issues/10 and https://github.com/zyedidia/perforator/issues/12 Also I add a flag exclude-clone to track original functions, not clone versions(func_name.cold, for example)

zyedidia commented 8 months ago

Thanks, could you explain the purpose of the new exclude-clones flag?

mkornaukhov03 commented 8 months ago

Thanks, could you explain the purpose of the new exclude-clones flag?

Yes, sure There is an example I want to see some perf stats on my cpp function EvalXgbNoCat_KmlFinal::predict_input. I just want to run perforator like this:

perforator -r EvalXgbNoCat_KmlFinal::predict_input ./executable

But there are multiple matching symbols:

func-lookup: Multiple matches:
_ZNK21EvalXgbNoCat_KmlFinal13predict_inputERK9InputFileRSt6vectorIdSaIdEE
_ZNK21EvalXgbNoCat_KmlFinal13predict_inputERK9InputFileRSt6vectorIdSaIdEE.cold

If I demangle the last one, I will see EvalXgbNoCat_KmlFinal::predict_input(InputFile const&, std::vector<double, std::allocator<double> >&) const [clone .cold]. Compiler have just optimized this function into several implementations. Without exclude-clones flag, I should specify mangled name for -r option. In most cases, we want to measure non-clone version of a function. I suppose exclude-clones options helps with it.

You may see more details about clones here: https://stackoverflow.com/questions/63648838/what-do-the-suffixes-part-and-cold-mean-in-the-result-of-linux-kernels-d

There was a bug with it in gdb: https://sourceware.org/bugzilla/show_bug.cgi?id=26096