google / pprof

pprof is a tool for visualization and analysis of profiling data
Apache License 2.0
7.95k stars 606 forks source link

pprof source and disasm views overcount recursive functions #707

Open kvakil opened 2 years ago

kvakil commented 2 years ago

What version of pprof are you using?

go tool pprof with go1.18.3.

Full go env output
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/keyhan/.cache/go-build"
GOENV="/home/keyhan/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/keyhan/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/keyhan/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/snap/go/9848"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/snap/go/9848/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2938140119=/tmp/go-build -gno-record-gcc-switches"

What operating system and processor architecture are you using?

Linux, x86-64 Intel.

What did you do?

I opened the attached profile with go tool pprof -http=: fac.pb.gz, and navigated to the "Sources" page. Here is the output:

Screenshot 2022-06-26 at 13-03-34 e cpu

As you can see, it is claimed that fac took 6193% of the total time. It is also said that it took 150 seconds, which is wrong since the program took only a couple of seconds.

The issue also occurs with the command line version, although strangely the output is better.

(pprof) list fac
Total: 2.43s
ROUTINE ======================== main.fac in /home/keyhan/exgo/e.go
     2.43s      3.34s (flat, cum) 137.45% of Total
         .          .      2:
         .          .      3:import "os"
         .          .      4:import "log"
         .          .      5:import "runtime/pprof"
         .          .      6:
     850ms      850ms      7:func fac(n uint64) uint64 {
      60ms       60ms      8:    if n == 0 {
         .          .      9:        return 1
         .          .     10:    } else {
     1.52s      2.43s     11:        return n * fac(n - 1)
         .          .     12:    }
         .          .     13:}
         .          .     14:
         .          .     15:func main() {
         .          .     16:    f, err := os.Create("example.pb.gz")

What did you expect to see?

cum% should not be more than 100%.

What did you see instead?

cum% is claimed to be 6193%.

Root cause(?)

I believe the root cause is this line of code:

https://github.com/google/pprof/blob/c488b8fa1db3fa467bf30beb5a1d6f4f10bb1b87/internal/report/source.go#L104

This is wrong, since the cumulative time spent in the current function is not fns.Sum() if the function calls itself recursively. In those cases some nodes will be counted multiple times.

Source Code

package main

import "os"
import "log"
import "runtime/pprof"

func fac(n uint64) uint64 {
    if n == 0 {
        return 1
    } else {
        return n * fac(n - 1)
    }
}

func main() {
    f, err := os.Create("example.pb.gz")
    if err != nil {
        log.Fatal("could not create CPU profile: ", err)
    }
    defer f.Close() // error handling omitted for example
    if err := pprof.StartCPUProfile(f); err != nil {
        log.Fatal("could not start CPU profile: ", err)
    }
    defer pprof.StopCPUProfile()
    for i := 1; i <= 1e6; i++ {
        fac(1000)
    }
}
kvakil commented 2 years ago

Forgot to mention that the graph view looks correct.

Screenshot 2022-06-26 at 13-16-44 e cpu

The disasm view is similarly broken:

(pprof) disasm fac
Total: 2.43s
ROUTINE ======================== main.fac
     2.43s      4.86s (flat, cum) 200.00% of Total
     240ms      240ms     4a6c60: CMPQ 0x10(R14), SP                      ;main.fac e.go:7
         .          .     4a6c64: JBE 0x4a6cac                            ;e.go:7
     190ms      190ms     4a6c66: SUBQ $0x10, SP                          ;main.fac e.go:7
     170ms      170ms     4a6c6a: MOVQ BP, 0x8(SP)
     220ms      220ms     4a6c6f: LEAQ 0x8(SP), BP
      60ms       60ms     4a6c74: TESTQ AX, AX                            ;main.fac e.go:8
         .          .     4a6c77: JNE 0x4a6c88                            ;e.go:8
         .          .     4a6c79: MOVL $0x1, AX                           ;e.go:9
         .          .     4a6c7e: MOVQ 0x8(SP), BP
         .          .     4a6c83: ADDQ $0x10, SP
         .          .     4a6c87: RET
      30ms       30ms     4a6c88: MOVQ AX, 0x18(SP)                       ;main.fac e.go:7
     180ms      180ms     4a6c8d: LEAQ -0x1(AX), CX                       ;main.fac e.go:11
     190ms      190ms     4a6c91: MOVQ CX, AX
      50ms      2.48s     4a6c94: CALL main.fac(SB)
         .          .     4a6c99: MOVQ 0x18(SP), CX                       ;e.go:11
     360ms      360ms     4a6c9e: IMULQ CX, AX                            ;main.fac e.go:11
     710ms      710ms     4a6ca2: MOVQ 0x8(SP), BP
      30ms       30ms     4a6ca7: ADDQ $0x10, SP
         .          .     4a6cab: RET                                     ;e.go:11
         .          .     4a6cac: MOVQ AX, 0x8(SP)                        ;e.go:7
         .          .     4a6cb1: CALL runtime.morestack_noctxt.abi0(SB)
         .          .     4a6cb6: MOVQ 0x8(SP), AX
         .          .     4a6cbb: ?