traefik / yaegi

Yaegi is Another Elegant Go Interpreter
https://pkg.go.dev/github.com/traefik/yaegi
Apache License 2.0
6.78k stars 341 forks source link

significant memory leak when executing a lambda #1618

Closed poolpOrg closed 3 months ago

poolpOrg commented 3 months ago

The following program sample.go triggers an unexpected result

package main

import (
    "fmt"
    "runtime"
    "sync"
)

func humanizeBytes(bytes uint64) string {
    const (
        _         = iota
        kB uint64 = 1 << (10 * iota)
        mB
        gB
        tB
        pB
    )

    switch {
    case bytes < kB:
        return fmt.Sprintf("%dB", bytes)
    case bytes < mB:
        return fmt.Sprintf("%.2fKB", float64(bytes)/float64(kB))
    case bytes < gB:
        return fmt.Sprintf("%.2fMB", float64(bytes)/float64(mB))
    case bytes < tB:
        return fmt.Sprintf("%.2fGB", float64(bytes)/float64(gB))
    case bytes < pB:
        return fmt.Sprintf("%.2fTB", float64(bytes)/float64(tB))
    default:
        return fmt.Sprintf("%dB", bytes)
    }
}

func main() {
    i := 0
    wg := sync.WaitGroup{}
    for {
        var m runtime.MemStats
        runtime.ReadMemStats(&m)
        fmt.Printf("#%d: alloc = %s, routines = %d, gc = %d\n", i, humanizeBytes(m.Alloc), runtime.NumGoroutine(), m.NumGC)

        wg.Add(1)
        go func() {
            wg.Done()
        }()
        wg.Wait()
        i = i + 1
    }
}

Expected result

[...]
#1295805: alloc = 1.79MB, routines = 1, gc = 21
#1295806: alloc = 1.79MB, routines = 1, gc = 21
#1295807: alloc = 1.79MB, routines = 1, gc = 21
[...]

Got

[...]
#1046468: alloc = 9.35GB, routines = 1, gc = 33
#1046469: alloc = 9.35GB, routines = 1, gc = 33
#1046470: alloc = 9.35GB, routines = 1, gc = 33
[...]

It causes traefik OOM when running plugins on a machine with frequent goroutine spawns (ie: a ticker).

Yaegi Version

all versions >= 0.14.3

Additional Notes

This isn't a goroutine leak, as the goroutines count remains stable and it isn't a garbage collector contention as the garbage collector is in the same range of iterations.

I seem to have found the issue, currently studying the code base and the impact of a fix I came up with, hopefully a developer can get in touch with me. On the left is the leak observed, on the right is the same execution with my fix:

image

mvertes commented 3 months ago

Thanks for this report. I am curious about your fix, could you share it? Thanks.

mvertes commented 3 months ago

You just call fr := f.clone(!fork) instead of fr := f.clone(fork) at https://github.com/traefik/yaegi/blob/9aa161f2da6ef119d62d939ba113af8cad5c54b2/interp/run.go#L1899 right?

poolpOrg commented 3 months ago

That's the most trivial way to get rid of the leak and confirm that it is the data being copied inside the frame clone that's leaking, I also had a couple different ways, none of which I'm convinced are correct.

Is the clone even needed ?

I'm sorry, I removed the details because I realized someone with an understanding of what the diff does could figure how to crash traefik instances with middlewares containing a specific yet very common pattern of code.