mosn / holmes

self-aware Golang profile dumper
Apache License 2.0
1.06k stars 135 forks source link

heap samples are not what I expect #116

Closed dumbFeng closed 1 year ago

dumbFeng commented 1 year ago

What version of Go are you using (go version)?

$  1.17.8

Does this issue reproduce with the latest release?

Haven't tried.

What operating system and processor architecture are you using (go env)?

go env Output
$ MacOS and amd64

What did you do?

I wrote a piece of code:

var (
    default = 1073741824
    a []byte
)

func MakeNGbSlice(ctx *gin.Context) {
    sizeStr ,ok := ctx.GetQuery("size")
    if ok {
        size, _ := strconv.Atoi(sizeStr)
        if size > 0 {
            defaultSize = size * defaultSize
        }
    }

    a = make([]byte, 0, defaultSize)
    for i := 0; i < defaultSize; i ++ {
        a = append(a, byte('a'))
    }
    time.Sleep(time.Second * 10)
    a = nil // for gc
}

Then I curled the api to trigger the code to run.

What did you expect to see?

  1. The RSS my app used went up to 1GB.
  2. The heap profile data was right and could help me find out the reason why RSS went up after I dumped the heap samples.

What did you see instead?

  1. The RSS my app used went up to 1GB.
  2. The heap profile seemed to be weird:
    Type: inuse_space
    Time: Aug 3, 2022 at 2:32pm (CST)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) top
    Showing nodes accounting for 12052.56kB, 64.38% of 18721.44kB total
    Showing top 10 nodes out of 153
      flat  flat%   sum%        cum   cum%
    2562.81kB 13.69% 13.69%  3075.02kB 16.43%  runtime.allocm
    2048.81kB 10.94% 24.63%  2048.81kB 10.94%  runtime.malg
    2048.19kB 10.94% 35.57%  2048.19kB 10.94%  github.com/Shopify/sarama.(*TopicMetadata).decode
    1097.69kB  5.86% 41.44%  1097.69kB  5.86%  github.com/Shopify/sarama.(*client).updateMetadata
    1089.33kB  5.82% 47.26%  1089.33kB  5.82%  google.golang.org/grpc/internal/transport.newBufWriter
    // ...
  3. After about 2 minutes, I profiled again and the result was what I expected:
    Type: inuse_space
    Time: Aug 3, 2022 at 2:33pm (CST)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) top
    Showing nodes accounting for 1024.50MB, 98.25% of 1042.78MB total
    Dropped 154 nodes (cum <= 5.21MB)
    Showing top 10 nodes out of 18
      flat  flat%   sum%        cum   cum%
    1024MB 98.20% 98.20%     1024MB 98.20%  .../examples.Make1GbSlice
    0.50MB 0.048% 98.25%     6.63MB  0.64%  runtime.main
         0     0% 98.25%  1024.51MB 98.25%  github.com/gin-gonic/gin.(*Context).Next (inline)
         0     0% 98.25%  1024.51MB 98.25%  github.com/gin-gonic/gin.(*Engine).ServeHTTP
         0     0% 98.25%  1024.51MB 98.25%  github.com/gin-gonic/gin.(*Engine).handleHTTPRequest

    So, it seems like that the samples pprof dumps in time are wrong.

For more discussion, Please see golang issue #54233

doujiang24 commented 1 year ago

As https://github.com/golang/go/issues/54223#issuecomment-1204487665 said, heap profiles are only updated when the gc runs. Actually, it's a go heap profile related issue, not a holmes related issue. For more details, please see: https://mosn.io/blog/posts/mosn-holmes-design/#%E6%A0%B9%E6%8D%AEgc%E5%91%A8%E6%9C%9F%E9%87%87%E9%9B%86 https://uncledou.site/2022/go-pprof-heap/

Hope this could help you.

dumbFeng commented 1 year ago

@doujiang24 Nice! Thank you.

dumbFeng commented 1 year ago

But I think they are two different situations. About the one I proposed, I think holmes should dump after next gc runs while the heap peak occurs. In the following processes, the heap profiles are stale and cann't help us find out what incurs the problem:

  1. gc runs
  2. heap peak happens. Such as making a big slice
  3. holmes dumps
  4. next gc runs
cch123 commented 1 year ago

You should see both the inuse and alloc space together to help locating the root cause

A big heap object will cause rss spike, if it is not used any more later, it will not appear on the inuse object pprof list, but you can find it in the alloc object list

All objects allocated after app started will be counted in the alloc objects/space