Closed tfuxu closed 10 months ago
Thanks for reporting this. My understanding of the Go garbage collector is that everything allocated by the Dither
function will eventually be freed. Maybe it just hadn't kicked in yet for your test? Because it's not the job of my code necessarily to "run through a garbage collection after dithering", that should happen automatically. I will try and see if my code is using too much memory however, 500 MB seems high. Is that for a single iteration of Dither
?
To aid in reproducing the issue, could you explain how you were measuring the memory usage?
It looks to me like a memory leak, as GC should free most of the memory allocated.
I will try and see if my code is using too much memory however, 500 MB seems high. Is that for a single iteration of
Dither
?
Yes, this is from only one iteration of Dither
. It seams that GC does some job in the subsequent iterations, as at most with 5 iterations I've got ~700MB allocated.
This is an GC output of the reproducer with one iteration cycle set:
$ GODEBUG=gctrace=1 ./main
gc 1 @0.000s 3%: 0.008+0.17+0.022 ms clock, 0.13+0.040/0.10/0+0.35 ms cpu, 25->25->25 MB, 25 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 2 @0.116s 0%: 0.014+1.6+0.012 ms clock, 0.22+0.036/5.1/3.2+0.19 ms cpu, 49->74->74 MB, 50 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 3 @0.119s 1%: 0.014+8.2+0.010 ms clock, 0.22+0.10/24/10+0.17 ms cpu, 126->190->190 MB, 148 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 4 @1.500s 0%: 0.013+31+0.012 ms clock, 0.21+0.045/123/0+0.20 ms cpu, 324->325->248 MB, 380 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 5 @3.337s 0%: 0.015+34+0.024 ms clock, 0.24+0.034/134/149+0.38 ms cpu, 424->425->324 MB, 497 MB goal, 0 MB stacks, 0 MB globals, 16 P
I've also experimented with pprof
, which is a tool for profiling Go code. From what I read here, your code probably spends too much time at allocating memory, especially in color linearization.
Here's an output from top -cum
command of the pprof suite:
(pprof) top -cum
Showing nodes accounting for 2.60s, 12.49% of 20.81s total
Dropped 153 nodes (cum <= 0.10s)
Showing top 10 nodes out of 81
flat flat% sum% cum cum%
0 0% 0% 18.70s 89.86% github.com/makeworld-the-better-one/dither/v2/examples/bigger_images.Dither
0 0% 0% 18.70s 89.86% github.com/makeworld-the-better-one/dither/v2/examples/bigger_images.TestDither
0 0% 0% 18.70s 89.86% testing.tRunner
1.56s 7.50% 7.50% 13.53s 65.02% github.com/makeworld-the-better-one/dither/v2.(*Ditherer).Dither
0.52s 2.50% 10.00% 7.13s 34.26% github.com/makeworld-the-better-one/dither/v2.(*Ditherer).Dither.func3
0.04s 0.19% 10.19% 5.22s 25.08% github.com/makeworld-the-better-one/dither/v2.unpremultAndLinearize
0.03s 0.14% 10.33% 5.18s 24.89% github.com/makeworld-the-better-one/dither/v2.toLinearRGB
0.32s 1.54% 11.87% 5.15s 24.75% github.com/makeworld-the-better-one/dither/v2.linearize255to65535
0.13s 0.62% 12.49% 4.83s 23.21% github.com/makeworld-the-better-one/dither/v2.linearize1 (inline)
0 0% 12.49% 4.72s 22.68% image/png.(*Encoder).Encode
This shows the cumulative CPU time taken by child functions executed by the Dither
parent function (and some others that are used in my reproducer).
To aid in reproducing the issue, could you explain how you were measuring the memory usage?
I've simply used htop
, that part in reproducer where it waits for user input was there specifically to catch it by me that the memory is not freed at the end of the function scope. Now I use gctrace=1
flag in GODEBUG
env, as shown in the second comment above.
Thanks for all these logs. I'll start investigating the memory profiling.
After some code changes (see eedcc4d), this is the GC output for your code, with the original 4 loop cycles set.
➤ GODEBUG=gctrace=1 ./debug
gc 1 @0.001s 2%: 0.019+0.24+0.007 ms clock, 0.15+0.15/0.16/0.036+0.060 ms cpu, 25->25->25 MB, 25 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 2 @0.128s 0%: 0.029+0.10+0.003 ms clock, 0.23+0.059/0.075/0+0.029 ms cpu, 49->50->50 MB, 50 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 3 @1.249s 0%: 0.017+0.35+0.006 ms clock, 0.14+0.032/0.13/0+0.053 ms cpu, 97->97->65 MB, 100 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 4 @1.673s 0%: 0.020+0.41+0.006 ms clock, 0.16+0.031/0.16/0+0.053 ms cpu, 127->127->65 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 5 @3.170s 0%: 0.020+0.32+0.006 ms clock, 0.16+0.032/0.089/0+0.054 ms cpu, 127->130->34 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 6 @3.172s 0%: 0.010+0.35+0.007 ms clock, 0.083+0.023/0.10/0+0.057 ms cpu, 65->65->65 MB, 68 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 7 @4.460s 0%: 0.015+0.34+0.006 ms clock, 0.12+0.050/0.11/0+0.055 ms cpu, 126->126->65 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 8 @4.880s 0%: 0.018+0.24+0.008 ms clock, 0.14+0.041/0.055/0.10+0.070 ms cpu, 127->127->65 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 9 @6.191s 0%: 0.015+0.33+0.004 ms clock, 0.12+0.035/0.11/0+0.033 ms cpu, 127->130->62 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 10 @7.493s 0%: 0.023+0.35+0.006 ms clock, 0.19+0.032/0.12/0+0.050 ms cpu, 122->122->65 MB, 125 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 11 @7.932s 0%: 0.021+0.16+0.006 ms clock, 0.16+0.034/0.13/0+0.051 ms cpu, 126->126->65 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 12 @9.380s 0%: 0.016+0.29+0.004 ms clock, 0.13+0.033/0.10/0+0.035 ms cpu, 126->129->57 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 13 @10.693s 0%: 0.020+0.29+0.006 ms clock, 0.16+0.037/0.16/0+0.055 ms cpu, 111->111->65 MB, 114 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 14 @11.155s 0%: 0.023+0.25+0.007 ms clock, 0.19+0.041/0.16/0+0.056 ms cpu, 126->126->65 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P
This shows a ~70% memory use reduction, and to me indicates no memory leaks across runs. If you think this is reasonable, I'll close this issue and make a new release. Otherwise let me know where you think further issues are.
Yep, it looks reasonable to me. Thanks for fixing this! I'll be testing those changes with goroutines and some multi-threaded code in my python bindings, so if I find some more issues, I'll let you know later.
The latest release, v2.4.0, incorporates this change.
I've noticed recently that
dither
uses ~500MB of RAM when dithering an example 3414x1920 image and it doesn't appear to run through a garbage collection after dithering.Is this an expected behavior? If yes, could it be somehow reduced?
Minimal reproducer:
Note
Minimal reproducer does a dithering operation four times to show that the allocated memory is exponentially higher in subsequent iterations.