Open tarasmadan opened 6 months ago
@dvyukov proposed third option. Let's remove addr2line dependency and parse DWARF data. His prototype:
package main
import (
"debug/dwarf"
"debug/elf"
"fmt"
"io"
"os"
"bufio"
"time"
"strconv"
)
func main() {
start := time.Now()
pcs := make(map[uint64]struct{})
for s := bufio.NewScanner(os.Stdin); s.Scan(); {
n, err := strconv.ParseUint(s.Text(), 16, 64)
if err != nil {
panic(err)
}
pcs[n] = struct{}{}
}
fmt.Printf("read %v pcs in %v\n", len(pcs), time.Since(start))
f, err := elf.Open(os.Args[1])
if err != nil {
panic(err)
}
data, err := f.DWARF()
if err != nil {
panic(err)
}
matched, total := 0, 0
for r := data.Reader(); ; {
ent, err := r.Next()
if err != nil {
panic(err)
}
if ent == nil {
break
}
if ent.Tag != dwarf.TagCompileUnit {
panic(fmt.Errorf("found unexpected tag %v on top level", ent.Tag))
}
lr, err := data.LineReader(ent)
if err != nil {
panic(err)
}
var entry dwarf.LineEntry
for {
if err := lr.Next(&entry); err != nil {
if err == io.EOF {
break
}
panic(err)
}
total++
if _, ok := pcs[entry.Address]; !ok {
continue
}
matched++
//fmt.Printf("pc %x %v:%v:%v\n", entry.Address, entry.File.Name, entry.Line, entry.Column)
}
r.SkipChildren()
}
fmt.Printf("total %v, matched %v\n", total, matched)
}
His prototype:
It turns out to be not that easy. LineReader has info about inlined frames, but only file:line, not the function name. And we need inline function names in both pkg/report and pkg/cover. Inlined function names has something to do with TagInlinedSubroutine, but I have not figure out how exactly these tags should be processed. llvm-addr2line code can be used as a reference source.
File:line to function name looks doable having the source code itself. Any chances to get the StartLine:StartPos - EndLine:EndPos?
LineEntry has Column field: https://pkg.go.dev/debug/dwarf@go1.22.2#LineEntry
Is your feature request related to a problem? Please describe. First /cover request with lazy symbolization - 19s. Time to get updated numbers (after 5 seconds) - 17s. RAM consumption 40G.
Describe the solution you'd like Full symbolization costs 50 seconds and is comparable with syzkaller startup time (with QEMU). Symbolizing all callbacks before first /cover call we can reduce its generation time to 3 seconds and memory consumption to 0G.
There are 2 potential solutions:
Second approach looks better but will cost more.