google / syzkaller

syzkaller is an unsupervised coverage-guided kernel fuzzer
Apache License 2.0
5.31k stars 1.21k forks source link

pkg/cover: symbolization #4585

Open tarasmadan opened 6 months ago

tarasmadan commented 6 months ago

Is your feature request related to a problem? Please describe. First /cover request with lazy symbolization - 19s. Time to get updated numbers (after 5 seconds) - 17s. RAM consumption 40G.

Describe the solution you'd like Full symbolization costs 50 seconds and is comparable with syzkaller startup time (with QEMU). Symbolizing all callbacks before first /cover call we can reduce its generation time to 3 seconds and memory consumption to 0G.

There are 2 potential solutions:

  1. Symbolize everything in background on syzkaller start.
  2. Symbolize all callbacks after/during the kernel build process and use it as a build artefact. GZIPped data will cost ~30M.

Second approach looks better but will cost more.

tarasmadan commented 6 months ago

@dvyukov proposed third option. Let's remove addr2line dependency and parse DWARF data. His prototype:

package main

import (
    "debug/dwarf"
    "debug/elf"
    "fmt"
    "io"
    "os"
    "bufio"
    "time"
    "strconv"
)

func main() {
    start := time.Now()
    pcs := make(map[uint64]struct{})
    for s := bufio.NewScanner(os.Stdin); s.Scan(); {
        n, err := strconv.ParseUint(s.Text(), 16, 64)
        if err != nil {
            panic(err)
        }
        pcs[n] = struct{}{}
    }
    fmt.Printf("read %v pcs in %v\n", len(pcs), time.Since(start))

    f, err := elf.Open(os.Args[1])
    if err != nil {
        panic(err)
    }
    data, err := f.DWARF()
    if err != nil {
        panic(err)
    }
    matched, total := 0, 0
    for r := data.Reader(); ; {
        ent, err := r.Next()
        if err != nil {
            panic(err)
        }
        if ent == nil {
            break
        }
        if ent.Tag != dwarf.TagCompileUnit {
            panic(fmt.Errorf("found unexpected tag %v on top level", ent.Tag))
        }
        lr, err := data.LineReader(ent)
        if err != nil {
            panic(err)
        }
        var entry dwarf.LineEntry
        for {
            if err := lr.Next(&entry); err != nil {
                if err == io.EOF {
                    break
                }
                panic(err)
            }
            total++
            if _, ok := pcs[entry.Address]; !ok {
                continue
            }
            matched++
            //fmt.Printf("pc %x %v:%v:%v\n", entry.Address, entry.File.Name, entry.Line, entry.Column)
        }
        r.SkipChildren()
    }
    fmt.Printf("total %v, matched %v\n", total, matched)
}
dvyukov commented 5 months ago

His prototype:

It turns out to be not that easy. LineReader has info about inlined frames, but only file:line, not the function name. And we need inline function names in both pkg/report and pkg/cover. Inlined function names has something to do with TagInlinedSubroutine, but I have not figure out how exactly these tags should be processed. llvm-addr2line code can be used as a reference source.

tarasmadan commented 5 months ago

File:line to function name looks doable having the source code itself. Any chances to get the StartLine:StartPos - EndLine:EndPos?

dvyukov commented 5 months ago

LineEntry has Column field: https://pkg.go.dev/debug/dwarf@go1.22.2#LineEntry