mandiant / GoReSym

Go symbol recovery tool
MIT License
498 stars 62 forks source link

improve performance of pcln/moduledata scans #27

Closed williballenthin closed 11 months ago

williballenthin commented 11 months ago

GoReSym spends a lot of time copying data around during the scans for pcln and moduledata. Each scan is invoked multiple time, and within each scan, there are multiple searches against section data. In other words, there's a pretty hot path here.

For example, when analyzing a kubectl binary, GoReSym spends more than 1.3s (!) in the DataAfterSection routine. This is surprising because the routine simply copies big data regions from sections into a single, contiguous array. Ideally, we should only spend a split-second copying section data around; spending a full second here means there's way too much extra work being done.

It would be nice to invoke the routine fewer times; however, GoReSym has to juggle a fairly complex data recovery algorithm, so its not a quick job.

Some things that worked for me:

  1. Concatenate all the second data once and pass this around
  2. Cache the results of DataAfterSection trading memory for CPU

Unfortunately, due to #23 I don't have GoReSym working in WSL, so I was doing my optimization against v2.3 and there were merge conflicts when cherry picking (1) into master. (2) still works for me an I may open a PR to propose this optimization.

williballenthin commented 11 months ago

example improvement (re-using concatenated section data):

before:

❯ time ./GoReSym 11d7cb5750c44c40e767c7c4fa0f388ed64f636cf6b97ceee7bf9ae77683a3c0 > /dev/null
2023/07/21 13:18:58 profile: cpu profiling enabled, cpu.pprof
GoReSym: profile: cpu profiling disabled, cpu.pprof

________________________________________________________
Executed in   15.20 secs    fish           external
   usr time   15.01 secs  151.00 micros   15.01 secs
   sys time    0.87 secs  136.00 micros    0.87 secs

❯ pprof -cum -top cpu.pprof
File: GoReSym
Type: cpu
Time: Jul 21, 2023 at 1:18pm (CEST)
Duration: 15.19s, Total samples = 15.42s (101.50%)
Showing nodes accounting for 14.99s, 97.21% of 15.42s total
Dropped 88 nodes (cum <= 0.08s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%     14.67s 95.14%  main.main
         0     0%     0%     14.67s 95.14%  runtime.main
     0.01s 0.065% 0.065%     14.40s 93.39%  main.main_impl
         0     0% 0.065%     12.78s 82.88%  github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable
         0     0% 0.065%     12.78s 82.88%  github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)
         0     0% 0.065%     12.65s 82.04%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln
     0.29s  1.88%  1.95%     12.65s 82.04%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan
     4.57s 29.64% 31.58%     12.14s 78.73%  github.com/mandiant/GoReSym/objfile.findAllOccurrences (inline)
     1.03s  6.68% 38.26%      7.57s 49.09%  bytes.Equal (inline)
     5.48s 35.54% 73.80%      5.48s 35.54%  memeqbody
         0     0% 73.80%      1.47s  9.53%  github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable
         0     0% 73.80%      1.47s  9.53%  github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable
         0     0% 73.80%      1.47s  9.53%  github.com/mandiant/GoReSym/objfile.(*peFile).moduledata_scan
         0     0% 73.80%      1.39s  9.01%  github.com/mandiant/GoReSym/debug/pe.(*File).DataAfterSection
     1.06s  6.87% 80.67%      1.06s  6.87%  runtime.memequal
     0.79s  5.12% 85.80%      0.79s  5.12%  runtime.memmove
         0     0% 85.80%      0.79s  5.12%  runtime.systemstack
         0     0% 85.80%      0.72s  4.67%  runtime.gcBgMarkWorker
         0     0% 85.80%      0.72s  4.67%  runtime.gcBgMarkWorker.func2
         0     0% 85.80%      0.72s  4.67%  runtime.gcDrain
         0     0% 85.80%      0.56s  3.63%  github.com/mandiant/GoReSym/debug/pe.(*Section).Data
         0     0% 85.80%      0.53s  3.44%  runtime.growslice

after:

❯ time ./GoReSym 11d7cb5750c44c40e767c7c4fa0f388ed64f636cf6b97ceee7bf9ae77683a3c0 > /dev/null
2023/07/21 13:21:06 profile: cpu profiling enabled, cpu.pprof
GoReSym: profile: cpu profiling disabled, cpu.pprof

________________________________________________________
Executed in    6.74 secs    fish           external
   usr time    6.46 secs  175.00 micros    6.46 secs
   sys time    0.39 secs  150.00 micros    0.39 secs

❯ pprof -cum -top cpu.pprof
File: GoReSym
Type: cpu
Time: Jul 21, 2023 at 1:21pm (CEST)
Duration: 6.73s, Total samples = 6.64s (98.67%)
Showing nodes accounting for 6.51s, 98.04% of 6.64s total
Dropped 46 nodes (cum <= 0.03s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%      6.35s 95.63%  main.main
         0     0%     0%      6.35s 95.63%  runtime.main
         0     0%     0%      6.12s 92.17%  main.main_impl
     1.98s 29.82% 29.82%      5.34s 80.42%  github.com/mandiant/GoReSym/objfile.findAllOccurrences (inline)
         0     0% 29.82%      3.67s 55.27%  github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable
         0     0% 29.82%      3.67s 55.27%  github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)
         0     0% 29.82%      3.61s 54.37%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln
     0.03s  0.45% 30.27%      3.61s 54.37%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan
     0.44s  6.63% 36.90%      3.36s 50.60%  bytes.Equal (inline)
     2.44s 36.75% 73.64%      2.44s 36.75%  memeqbody
         0     0% 73.64%      2.34s 35.24%  github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable
         0     0% 73.64%      2.34s 35.24%  github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable
     0.03s  0.45% 74.10%      2.34s 35.24%  github.com/mandiant/GoReSym/objfile.(*peFile).moduledata_scan
         0     0% 74.10%      0.55s  8.28%  github.com/mandiant/GoReSym/objfile.(*peFile).sectionData

and when paired with #26 the total runtime was down to 1.06 🔥

williballenthin commented 11 months ago

closed in #28