green-kernel / powerletrics

Powermetrics for Linux
GNU Affero General Public License v3.0
5 stars 0 forks source link

Profiling with py-spy revelations #14

Open ArneTR opened 1 week ago

ArneTR commented 1 week ago

I profiled the powerletrics -i 1000 command with py-spy.

Once with RAPL and once where I segemented the file open calls for reading memory and cmdline.

Without RAPL and with segemented file open

  %Own   %Total  OwnTime  TotalTime  Function (filename)
 25.00%  25.00%   34.34s    34.34s   _shutdown (threading.py)
  5.00%   5.00%   13.21s    24.03s   get_data (powerletrics/powerletrics.py)
  0.00%   0.00%    3.23s     3.23s   next (bcc/table.py)
  0.00%   0.00%    1.63s     1.63s   __getitem__ (bcc/table.py)
  0.00%   0.00%    1.04s     1.04s   get_cmdline (powerletrics/powerletrics.py)
  0.00%   0.00%   0.980s    0.980s   __delitem__ (bcc/table.py)
  0.00%   0.00%   0.960s     2.92s   get_table (bcc/__init__.py)
  0.00%   0.00%   0.550s    0.820s   print_text (powerletrics/powerletrics.py)
  0.00%   0.00%   0.440s    0.470s   _assert_is_bytes (bcc/utils.py)
  0.00%   0.00%   0.420s    0.420s   __init__ (bcc/table.py)
  0.00%   0.00%   0.350s    0.460s   decode (json/decoder.py)
  0.00%   0.00%   0.230s    0.740s   Table (bcc/table.py)
  0.00%   0.00%   0.220s     1.13s   __contains__ (<frozen _collections_abc>)
  0.00%   0.00%   0.190s    0.190s   _decode_table_type (bcc/__init__.py)
  0.00%   0.00%   0.110s    0.110s   raw_decode (json/decoder.py)
  0.00%   0.00%   0.100s    0.560s   loads (json/__init__.py)
  0.00%   0.00%   0.100s    0.100s   <genexpr> (powerletrics/powerletrics.py)
  0.00%   0.00%   0.060s    0.090s   __eq__ (<frozen _collections_abc>)
  0.00%   0.00%   0.050s     3.94s   clear (bcc/table.py)
  0.00%   0.00%   0.030s    0.050s   energy_impact (powerletrics/powerletrics.py)
  0.00%   0.00%   0.030s    0.170s   cpu_utilization (powerletrics/powerletrics.py)
  0.00%   0.00%   0.030s    0.030s   __instancecheck__ (<frozen abc>)
  0.00%   0.00%   0.020s    0.050s   <lambda> (powerletrics/powerletrics.py)
  0.00%   0.00%   0.020s    0.020s   __bytes__ (bcc/utils.py)
  0.00%   0.00%   0.010s    0.010s   __init__ (bcc/utils.py)
  0.00%   0.00%   0.010s    0.010s   __iter__ (bcc/table.py)
  0.00%   0.00%   0.010s     3.24s   __next__ (bcc/table.py)
  0.00%   0.00%   0.000s    0.140s   get (<frozen _collections_abc>)
  0.00%   5.00%   0.000s    24.03s   run (threading.py)
  0.00%   5.00%   0.000s    24.03s   _bootstrap (threading.py)
  0.00%   0.00%   0.000s    0.010s   keys (bcc/table.py)
  0.00%   5.00%   0.000s    24.03s   _bootstrap_inner (threading.py)

What stands out is that the threading has either a large overhead or all of eBPF is counted into it.

py-spy definitely has an issue with applying percentages as none of those make any sense since they do not add up to 100%.

I am thus only looking on TotalTime.

In the user space get_data uses the most time. Suprisingly the file open calls are not that costly.

Also interesting is that get_table takes a lot of time. Why? For a simple lookup / memory copy? How big are these tables? can we maybe clear them more often?

The sorting also does not seem to be an issue.

So all in all: Looks fine, apart from the wild eBPF functions and threading overhead.

With RAPL data

  %Own   %Total  OwnTime  TotalTime  Function (filename)
 25.00%  25.00%   41.29s    41.29s   _shutdown (threading.py)
  9.00%   9.00%   16.04s    20.30s   get_data (powerletrics/powerletrics.py)
  7.00%   7.00%   12.54s    12.54s   rapl_metrics_provider_thread (powerletrics/powerletrics.py)
  0.00%   0.00%    1.20s     1.20s   next (bcc/table.py)
  0.00%   0.00%   0.850s    0.850s   __getitem__ (bcc/table.py)
  0.00%   0.00%   0.440s    0.610s   print_text (powerletrics/powerletrics.py)
  0.00%   0.00%   0.380s    0.380s   __delitem__ (bcc/table.py)
  0.00%   0.00%   0.330s     1.05s   get_table (bcc/__init__.py)
  0.00%   0.00%   0.180s    0.180s   __init__ (bcc/table.py)
  0.00%   0.00%   0.150s    0.150s   _assert_is_bytes (bcc/utils.py)
  0.00%   0.00%   0.130s    0.190s   decode (json/decoder.py)
  0.00%   0.00%   0.110s    0.350s   Table (bcc/table.py)
  0.00%   0.00%   0.090s    0.090s   <genexpr> (powerletrics/powerletrics.py)
  0.00%   0.00%   0.070s    0.620s   __contains__ (<frozen _collections_abc>)
  0.00%   0.00%   0.060s     1.55s   clear (bcc/table.py)
  0.00%   0.00%   0.060s    0.060s   raw_decode (json/decoder.py)
  0.00%   0.00%   0.050s     1.25s   __next__ (bcc/table.py)
  0.00%   0.00%   0.050s    0.060s   energy_impact (powerletrics/powerletrics.py)
  0.00%   0.00%   0.030s    0.060s   __eq__ (<frozen _collections_abc>)
  0.00%   0.00%   0.030s    0.030s   __instancecheck__ (<frozen abc>)
  0.00%   0.00%   0.020s    0.050s   cpu_utilization (powerletrics/powerletrics.py)
  0.00%   0.00%   0.020s    0.020s   _decode_table_type (bcc/__init__.py)
  0.00%   0.00%   0.010s    0.200s   loads (json/__init__.py)
  0.00%   0.00%   0.010s    0.010s   񆋀 (񱓐珂)
  0.00%  16.00%   0.000s    32.84s   _bootstrap (threading.py)
  0.00%   0.00%   0.000s    0.120s   get (<frozen _collections_abc>)
  0.00%  16.00%   0.000s    32.84s   _bootstrap_inner (threading.py)
  0.00%  16.00%   0.000s    32.84s   run (threading.py)
  0.00%   0.00%   0.000s    0.020s   <lambda> (powerletrics/powerletrics.py)

The rapl_metrics_provider_thread costs a lot of CPU time. Something is wrong here I believe ...

ArneTR commented 1 week ago

Just for completeness how I produces these values: