makew0rld / dither

A fast, correct image dithering library in Go.
Mozilla Public License 2.0
365 stars 13 forks source link

High memory usage on bigger images #13

Closed tfuxu closed 10 months ago

tfuxu commented 11 months ago

I've noticed recently that dither uses ~500MB of RAM when dithering an example 3414x1920 image and it doesn't appear to run through a garbage collection after dithering.

Is this an expected behavior? If yes, could it be somehow reduced?

Minimal reproducer:

package main

import (
    "fmt"
    "image"
    "image/color"
    "image/png"
    "os"

    "github.com/makeworld-the-better-one/dither/v2"
)

func Dither() {
    reader, err := os.Open("/home/tfuxc/Pictures/wallhaven-m9rqlk.png")
    if err != nil {
        panic(err)
    }
    defer reader.Close()

    img, _, err := image.Decode(reader)
    if err != nil {
        panic(fmt.Sprintln("decode err:", err))
    }

    palette := []color.Color{
        color.Black,
        color.White,
        color.RGBA{255, 0, 0, 255}, // Red
        color.RGBA{0, 255, 0, 255}, // Green
        color.RGBA{0, 0, 255, 255}, // Blue
    }

    d := dither.NewDitherer(palette)
    d.Matrix = dither.FloydSteinberg

    img = d.Dither(img)

    writer, err := os.Create("dithered.png")
    if err != nil {
        panic(err)
    }

    encode_err := png.Encode(writer, img)
    if encode_err != nil {
        writer.Close()
        return
    }

    if err := writer.Close(); err != nil {
        panic(err)
    }
}

func main() {
    for i := 0; i < 4; i++ {
        Dither()
    }

    var input string
    fmt.Println("Done! Press a key and enter to close.")
    fmt.Scan(&input)
}

Note

Minimal reproducer does a dithering operation four times to show that the allocated memory is exponentially higher in subsequent iterations.

makew0rld commented 11 months ago

Thanks for reporting this. My understanding of the Go garbage collector is that everything allocated by the Dither function will eventually be freed. Maybe it just hadn't kicked in yet for your test? Because it's not the job of my code necessarily to "run through a garbage collection after dithering", that should happen automatically. I will try and see if my code is using too much memory however, 500 MB seems high. Is that for a single iteration of Dither?

To aid in reproducing the issue, could you explain how you were measuring the memory usage?

tfuxu commented 11 months ago

It looks to me like a memory leak, as GC should free most of the memory allocated.

I will try and see if my code is using too much memory however, 500 MB seems high. Is that for a single iteration of Dither?

Yes, this is from only one iteration of Dither. It seams that GC does some job in the subsequent iterations, as at most with 5 iterations I've got ~700MB allocated.

This is an GC output of the reproducer with one iteration cycle set:

$ GODEBUG=gctrace=1 ./main 
gc 1 @0.000s 3%: 0.008+0.17+0.022 ms clock, 0.13+0.040/0.10/0+0.35 ms cpu, 25->25->25 MB, 25 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 2 @0.116s 0%: 0.014+1.6+0.012 ms clock, 0.22+0.036/5.1/3.2+0.19 ms cpu, 49->74->74 MB, 50 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 3 @0.119s 1%: 0.014+8.2+0.010 ms clock, 0.22+0.10/24/10+0.17 ms cpu, 126->190->190 MB, 148 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 4 @1.500s 0%: 0.013+31+0.012 ms clock, 0.21+0.045/123/0+0.20 ms cpu, 324->325->248 MB, 380 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 5 @3.337s 0%: 0.015+34+0.024 ms clock, 0.24+0.034/134/149+0.38 ms cpu, 424->425->324 MB, 497 MB goal, 0 MB stacks, 0 MB globals, 16 P
tfuxu commented 11 months ago

I've also experimented with pprof, which is a tool for profiling Go code. From what I read here, your code probably spends too much time at allocating memory, especially in color linearization.

Here's an output from top -cum command of the pprof suite:

(pprof) top -cum
Showing nodes accounting for 2.60s, 12.49% of 20.81s total
Dropped 153 nodes (cum <= 0.10s)
Showing top 10 nodes out of 81
      flat  flat%   sum%        cum   cum%
         0     0%     0%     18.70s 89.86%  github.com/makeworld-the-better-one/dither/v2/examples/bigger_images.Dither
         0     0%     0%     18.70s 89.86%  github.com/makeworld-the-better-one/dither/v2/examples/bigger_images.TestDither
         0     0%     0%     18.70s 89.86%  testing.tRunner
     1.56s  7.50%  7.50%     13.53s 65.02%  github.com/makeworld-the-better-one/dither/v2.(*Ditherer).Dither
     0.52s  2.50% 10.00%      7.13s 34.26%  github.com/makeworld-the-better-one/dither/v2.(*Ditherer).Dither.func3
     0.04s  0.19% 10.19%      5.22s 25.08%  github.com/makeworld-the-better-one/dither/v2.unpremultAndLinearize
     0.03s  0.14% 10.33%      5.18s 24.89%  github.com/makeworld-the-better-one/dither/v2.toLinearRGB
     0.32s  1.54% 11.87%      5.15s 24.75%  github.com/makeworld-the-better-one/dither/v2.linearize255to65535
     0.13s  0.62% 12.49%      4.83s 23.21%  github.com/makeworld-the-better-one/dither/v2.linearize1 (inline)
         0     0% 12.49%      4.72s 22.68%  image/png.(*Encoder).Encode

This shows the cumulative CPU time taken by child functions executed by the Dither parent function (and some others that are used in my reproducer).

tfuxu commented 11 months ago

To aid in reproducing the issue, could you explain how you were measuring the memory usage?

I've simply used htop, that part in reproducer where it waits for user input was there specifically to catch it by me that the memory is not freed at the end of the function scope. Now I use gctrace=1 flag in GODEBUG env, as shown in the second comment above.

makew0rld commented 10 months ago

Thanks for all these logs. I'll start investigating the memory profiling.

makew0rld commented 10 months ago

After some code changes (see eedcc4d), this is the GC output for your code, with the original 4 loop cycles set.

➤ GODEBUG=gctrace=1 ./debug
gc 1 @0.001s 2%: 0.019+0.24+0.007 ms clock, 0.15+0.15/0.16/0.036+0.060 ms cpu, 25->25->25 MB, 25 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 2 @0.128s 0%: 0.029+0.10+0.003 ms clock, 0.23+0.059/0.075/0+0.029 ms cpu, 49->50->50 MB, 50 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 3 @1.249s 0%: 0.017+0.35+0.006 ms clock, 0.14+0.032/0.13/0+0.053 ms cpu, 97->97->65 MB, 100 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 4 @1.673s 0%: 0.020+0.41+0.006 ms clock, 0.16+0.031/0.16/0+0.053 ms cpu, 127->127->65 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 5 @3.170s 0%: 0.020+0.32+0.006 ms clock, 0.16+0.032/0.089/0+0.054 ms cpu, 127->130->34 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 6 @3.172s 0%: 0.010+0.35+0.007 ms clock, 0.083+0.023/0.10/0+0.057 ms cpu, 65->65->65 MB, 68 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 7 @4.460s 0%: 0.015+0.34+0.006 ms clock, 0.12+0.050/0.11/0+0.055 ms cpu, 126->126->65 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 8 @4.880s 0%: 0.018+0.24+0.008 ms clock, 0.14+0.041/0.055/0.10+0.070 ms cpu, 127->127->65 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 9 @6.191s 0%: 0.015+0.33+0.004 ms clock, 0.12+0.035/0.11/0+0.033 ms cpu, 127->130->62 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 10 @7.493s 0%: 0.023+0.35+0.006 ms clock, 0.19+0.032/0.12/0+0.050 ms cpu, 122->122->65 MB, 125 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 11 @7.932s 0%: 0.021+0.16+0.006 ms clock, 0.16+0.034/0.13/0+0.051 ms cpu, 126->126->65 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 12 @9.380s 0%: 0.016+0.29+0.004 ms clock, 0.13+0.033/0.10/0+0.035 ms cpu, 126->129->57 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 13 @10.693s 0%: 0.020+0.29+0.006 ms clock, 0.16+0.037/0.16/0+0.055 ms cpu, 111->111->65 MB, 114 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 14 @11.155s 0%: 0.023+0.25+0.007 ms clock, 0.19+0.041/0.16/0+0.056 ms cpu, 126->126->65 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P

This shows a ~70% memory use reduction, and to me indicates no memory leaks across runs. If you think this is reasonable, I'll close this issue and make a new release. Otherwise let me know where you think further issues are.

tfuxu commented 10 months ago

Yep, it looks reasonable to me. Thanks for fixing this! I'll be testing those changes with goroutines and some multi-threaded code in my python bindings, so if I find some more issues, I'll let you know later.

makew0rld commented 10 months ago

The latest release, v2.4.0, incorporates this change.