Closed williballenthin closed 11 months ago
in this hot routine, rely on the stdlib bytes.Index rather than handrolling a memfind routine. its much faster.
bytes.Index
before:
❯ time ./GoReSym 11d7cb5750c44c40e767c7c4fa0f388ed64f636cf6b97ceee7bf9ae77683a3c0 > /dev/null 2023/07/21 13:18:58 profile: cpu profiling enabled, cpu.pprof GoReSym: profile: cpu profiling disabled, cpu.pprof ________________________________________________________ Executed in 15.20 secs fish external usr time 15.01 secs 151.00 micros 15.01 secs sys time 0.87 secs 136.00 micros 0.87 secs ❯ pprof -cum -top cpu.pprof File: GoReSym Type: cpu Time: Jul 21, 2023 at 1:18pm (CEST) Duration: 15.19s, Total samples = 15.42s (101.50%) Showing nodes accounting for 14.99s, 97.21% of 15.42s total Dropped 88 nodes (cum <= 0.08s) flat flat% sum% cum cum% 0 0% 0% 14.67s 95.14% main.main 0 0% 0% 14.67s 95.14% runtime.main 0.01s 0.065% 0.065% 14.40s 93.39% main.main_impl 0 0% 0.065% 12.78s 82.88% github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable 0 0% 0.065% 12.78s 82.88% github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline) 0 0% 0.065% 12.65s 82.04% github.com/mandiant/GoReSym/objfile.(*peFile).pcln 0.29s 1.88% 1.95% 12.65s 82.04% github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan 4.57s 29.64% 31.58% 12.14s 78.73% github.com/mandiant/GoReSym/objfile.findAllOccurrences (inline) 1.03s 6.68% 38.26% 7.57s 49.09% bytes.Equal (inline) 5.48s 35.54% 73.80% 5.48s 35.54% memeqbody 0 0% 73.80% 1.47s 9.53% github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable 0 0% 73.80% 1.47s 9.53% github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable 0 0% 73.80% 1.47s 9.53% github.com/mandiant/GoReSym/objfile.(*peFile).moduledata_scan 0 0% 73.80% 1.39s 9.01% github.com/mandiant/GoReSym/debug/pe.(*File).DataAfterSection 1.06s 6.87% 80.67% 1.06s 6.87% runtime.memequal 0.79s 5.12% 85.80% 0.79s 5.12% runtime.memmove 0 0% 85.80% 0.79s 5.12% runtime.systemstack 0 0% 85.80% 0.72s 4.67% runtime.gcBgMarkWorker 0 0% 85.80% 0.72s 4.67% runtime.gcBgMarkWorker.func2 0 0% 85.80% 0.72s 4.67% runtime.gcDrain 0 0% 85.80% 0.56s 3.63% github.com/mandiant/GoReSym/debug/pe.(*Section).Data 0 0% 85.80% 0.53s 3.44% runtime.growslice
after:
❯ time ./GoReSym 11d7cb5750c44c40e767c7c4fa0f388ed64f636cf6b97ceee7bf9ae77683a3c0 > /dev/null 2023/07/21 13:20:03 profile: cpu profiling enabled, cpu.pprof GoReSym: profile: cpu profiling disabled, cpu.pprof ________________________________________________________ Executed in 3.02 secs fish external usr time 2.87 secs 204.00 micros 2.87 secs sys time 0.82 secs 176.00 micros 0.82 secs ❯ pprof -cum -top cpu.pprof File: GoReSym Type: cpu Time: Jul 21, 2023 at 1:20pm (CEST) Duration: 3.02s, Total samples = 3.30s (109.33%) Showing nodes accounting for 3.10s, 93.94% of 3.30s total Dropped 68 nodes (cum <= 0.02s) flat flat% sum% cum cum% 0 0% 0% 2.51s 76.06% main.main 0 0% 0% 2.51s 76.06% runtime.main 0 0% 0% 2.24s 67.88% main.main_impl 0 0% 0% 1.39s 42.12% github.com/mandiant/GoReSym/debug/pe.(*File).DataAfterSection 0 0% 0% 1.34s 40.61% github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable 0 0% 0% 1.34s 40.61% github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable 0 0% 0% 1.34s 40.61% github.com/mandiant/GoReSym/objfile.(*peFile).moduledata_scan 0.97s 29.39% 29.39% 0.97s 29.39% runtime.memmove 0 0% 29.39% 0.78s 23.64% github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable 0 0% 29.39% 0.78s 23.64% github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline) 0 0% 29.39% 0.75s 22.73% runtime.systemstack 0 0% 29.39% 0.73s 22.12% runtime.gcBgMarkWorker 0 0% 29.39% 0.73s 22.12% runtime.gcBgMarkWorker.func2 0 0% 29.39% 0.73s 22.12% runtime.gcDrain 0 0% 29.39% 0.64s 19.39% github.com/mandiant/GoReSym/objfile.(*peFile).pcln 0 0% 29.39% 0.64s 19.39% github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan 0.10s 3.03% 32.42% 0.60s 18.18% bytes.Index 0 0% 32.42% 0.57s 17.27% runtime.growslice
in this hot routine, rely on the stdlib
bytes.Index
rather than handrolling a memfind routine. its much faster.before:
after: