tidwall / sjson

Set JSON values very quickly in Go
MIT License
2.4k stars 165 forks source link

possible memory leak using sjson.Set #51

Open saartamir opened 2 years ago

saartamir commented 2 years ago

Hi @tidwall I'm using this library to enrich event without unmarshalling into Go struct, and there's seems to be a memory problem in sjson.Set(). This is how I use it:

func EnrichEvent(event string, ctx *fasthttp.RequestCtx) string {
    startTime := time.Now().UnixNano() / 1000000

    log := zap.L()

    ua := string(ctx.UserAgent())
    ip := ctx.RemoteIP().String()
    entryTs := strconv.Itoa(int(time.Now().UnixNano() / int64(time.Millisecond)))

    var err error

    // add new fields to events
    event, err = sjson.Set(event, "ua", ua)
    if err != nil {
        log.Error("could not add ua to event", zap.String("event", event), zap.Error(err))
    }

    event, err = sjson.Set(event, "client_ip", ip)
    if err != nil {
        log.Error("could not add client_ip to event", zap.String("event", event), zap.Error(err))
    }

    event, err = sjson.Set(event, "entry_ts", entryTs)
    if err != nil {
        log.Error("could not add entry_ts to event", zap.String("event", event), zap.Error(err))
    }

    endTime := time.Now().UnixNano() / 1000000
    promy.RecordMetric("EnrichTimeHistogram", 0, float64(endTime-startTime), nil)

    return event
}

and when I'm trying to profile the allocs, it looks like sjson keeps growing and growing: allocs

am I doing anything wrong?

tidwall commented 2 years ago

I wrote a program that tried to reproduce the issue, but I wasn't able to see the memory issues that you were seeing.

I took your example function and hardcoded the values that came from the fasthttp and `zap' libraries.

Here's the complete program:

package main

import (
    "fmt"
    "log"
    "runtime"
    "strconv"
    "time"

    "github.com/tidwall/sjson"
)

func main() {
    evs := make(chan string, 100)
    go func() {
        var totalBytes int64
        var totalEvents int64
        ticker := time.NewTicker(time.Second)
        for {
            select {
            case ev := <-evs:
                totalEvents++
                totalBytes += int64(len(ev))
            case <-ticker.C:
                var stats runtime.MemStats
                runtime.ReadMemStats(&stats)
                fmt.Printf(
                    "Processes %d Messages (%0.2f GB), Heap Size: %0.2f MB\n",
                    totalEvents,
                    float64(totalBytes)/1024/1024/1024,
                    float64(stats.HeapAlloc)/1024/1024,
                )
            }
        }
    }()

    for i := 0; i < runtime.NumCPU(); i++ {
        go func() {
            for {
                evs <- EnrichEvent("{}")
            }
        }()
    }
    select {}
}

// func EnrichEvent(event string, ctx *fasthttp.RequestCtx) string {
func EnrichEvent(event string) string {
    // startTime := time.Now().UnixNano() / 1000000

    // log := zap.L()

    // ua := string(ctx.UserAgent())
    ua := "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
    // ip := ctx.RemoteIP().String()
    ip := "127.10.105.81:98172"
    entryTs := strconv.Itoa(int(time.Now().UnixNano() / int64(time.Millisecond)))

    var err error

    // add new fields to events
    event, err = sjson.Set(event, "ua", ua)
    if err != nil {
        // log.Error("could not add ua to event", zap.String("event", event), zap.Error(err))
        log.Fatal("could not add ua to event", event, err)
    }

    event, err = sjson.Set(event, "client_ip", ip)
    if err != nil {
        // log.Error("could not add client_ip to event", zap.String("event", event), zap.Error(err))
        log.Fatal("could not add client_ip to event", event, err)
    }

    event, err = sjson.Set(event, "entry_ts", entryTs)
    if err != nil {
        // log.Error("could not add entry_ts to event", zap.String("event", event), zap.Error(err))
        log.Fatal("could not add entry_ts to event", event, err)
    }

    // endTime := time.Now().UnixNano() / 1000000
    // promy.RecordMetric("EnrichTimeHistogram", 0, float64(endTime-startTime), nil)

    return event
}

The program processes as many events as possible over a number of goroutines and prints out stats every second.

Here's what I'm seeing:

Processes 1139184 Messages (0.20 GB), Heap Size: 3.57 MB
Processes 2255683 Messages (0.39 GB), Heap Size: 2.86 MB
Processes 3358867 Messages (0.58 GB), Heap Size: 3.63 MB
Processes 4471472 Messages (0.77 GB), Heap Size: 3.48 MB
Processes 5598419 Messages (0.96 GB), Heap Size: 3.21 MB
Processes 6704932 Messages (1.16 GB), Heap Size: 3.65 MB
Processes 7796037 Messages (1.34 GB), Heap Size: 3.30 MB
Processes 8903962 Messages (1.53 GB), Heap Size: 3.61 MB
Processes 9993757 Messages (1.72 GB), Heap Size: 2.71 MB
Processes 11091102 Messages (1.91 GB), Heap Size: 2.02 MB
Processes 12201903 Messages (2.10 GB), Heap Size: 2.97 MB
Processes 13317121 Messages (2.29 GB), Heap Size: 2.09 MB
Processes 14428639 Messages (2.49 GB), Heap Size: 3.58 MB
Processes 15551060 Messages (2.68 GB), Heap Size: 3.53 MB
Processes 16684651 Messages (2.87 GB), Heap Size: 2.57 MB
Processes 17801735 Messages (3.07 GB), Heap Size: 3.52 MB
Processes 18914031 Messages (3.26 GB), Heap Size: 2.24 MB
Processes 20032989 Messages (3.45 GB), Heap Size: 2.07 MB
Processes 21146109 Messages (3.64 GB), Heap Size: 3.67 MB
Processes 22257386 Messages (3.83 GB), Heap Size: 3.54 MB
Processes 23346580 Messages (4.02 GB), Heap Size: 3.36 MB
Processes 24456117 Messages (4.21 GB), Heap Size: 3.04 MB
Processes 25562760 Messages (4.40 GB), Heap Size: 1.73 MB
Processes 26671997 Messages (4.60 GB), Heap Size: 3.61 MB
Processes 27789998 Messages (4.79 GB), Heap Size: 2.33 MB
Processes 28945007 Messages (4.99 GB), Heap Size: 3.31 MB
Processes 30069362 Messages (5.18 GB), Heap Size: 3.49 MB

After about 30 seconds, 30 million sjson.Set calls were processed with amounted to 5 GB of data. Yet the Go heap never goes above 4 MB.