Open hummeltech opened 5 months ago
Here's a quick and dirty visualization of the respective memory footprints. All tests are done consecutively, but plotted on the same (relative) time axis.
Conclusions so far:
jemalloc
seems the only one that returns substantial memory to the systemtcmalloc
seems to not ever return anything, maybe additional configuration is required?glib
does have a dip here and there - are those just "lucky" allocations that reduced the top-of-heap?Timeline: 0s: start
renderd
- it loads the mapnik style file within ~10-15 seconds 30s: issue render request for all tiles at zoom level 6 45s-100s: database is churning like crazy,renderd
threads are mostly idle 100-220s:renderd
threads are using up CPU and memory 220s-250s: render_list exits, all rendering is done 300s: benchmark stops
@rolandbosa IIRC, some of mod_tile
's dependencies are likely the cause of the high memory consumption displayed in this chart. I will have to more closely look at it again, but thank you very much for compiling this!
@rolandbosa, thanks for the work, I made a few changes to your branch here (as well as adding mimalloc
into the mix), I've tried out the script and am seeing this:
Alright @rolandbosa, I have merged in your MALLOC_LIB
(#454) and strtok_r
(#455) commits into mod_tile
and rebased the above-mentioned branch.
Thanks for the help!
@hummeltech Oh, that merge was a surprise... I consider my branch still work in progress, and it had quite some junk in it. But now I see you just took the "good" parts and merged those. Thank you! I think the jemalloc
choice is a step in the right direction!
I don't know if I've mentioned this already, but I managed ("hacked" - definitely don't merge that ;-) ) to get renderd
to quit and dump a leak profile. But I'm having problems analyzing the profile in depth and won't have time for a while to work on that. But it looks though like there are no apparent leaks during the renderd::mapnik
calls, it's just a lot of caching that sticks around.
So I started thinking of implementing some configurable recycling of the render threads, which might help to cut down on the overall memory usage; of course, at the cost of needing to reload the map style and fonts etc. Maybe with some tweak-able parameters to spread the cost of spinning up a render thread over N tiles, or M seconds, or both. I hope to get back to this when time allows...
Haha, yes that's right, just the "best" parts of your use_jemalloc
branch, thanks again!
I did try out your exit please
command and I do think it is a good idea, but was getting a segmentation fault here. Also, I think it might go better under a renderd
option if and when you get back to it. Thanks for your contributions!
Hm, I can see this possibility for the fault: I tried to find a way to distinguish if a statistics thread was created or not by initializing the value of stats_thread
with a "null value". Apparently this doesn't exist portably, instead one should use the value of pthread_self()
and do a pthreads_equal()
comparison. However, according to the man page (and also as qbert220 points out), the value is undefined upon return, so a proper implementation should also re-initialize the variable upon creation failure (in addition to the initialization before checking if a statistics threads is requested at all). Why the creation would fail is a different question though...
In either case, the statistics thread is also sitting in a hard sleep()
and that should be rewritten to use a timed condition sleep: sleep for the requested time OR when a exit has been requested. But since this was just testing code to figure out any post-mortem leaks, I chose the "Can't be bothered with this detail now"-way.
These days, renderd
does not terminate cleanly (the signal handlers didn't work for me, they seem to be a mechanism for the render threads to indicate a catastrophy (e.g. style/font loading failed)). Wouldn't the proper way be to respond to a signal? I'd imagine that's what e.g. systemd
would send to the process e.g. during a shutdown/reboot and nowadays it just gets a kill
signal after some time of not responding to it...? How important would that be as a feature?
Hm, I can see this possibility for the fault: I tried to find a way to distinguish if a statistics thread was created or not by initializing the value of
stats_thread
with a "null value". Apparently this doesn't exist portably, instead one should use the value ofpthread_self()
and do apthreads_equal()
comparison. However, according to the man page (and also as qbert220 points out), the value is undefined upon return, so a proper implementation should also re-initialize the variable upon creation failure (in addition to the initialization before checking if a statistics threads is requested at all). Why the creation would fail is a different question though...
Thanks for the explanation, that sounds like it might be a good addition. The stats
thread is optional, so that is at least one reason it would not be created, any other errors should lead to an exit.
In either case, the statistics thread is also sitting in a hard
sleep()
and that should be rewritten to use a timed condition sleep: sleep for the requested time OR when a exit has been requested. But since this was just testing code to figure out any post-mortem leaks, I chose the "Can't be bothered with this detail now"-way.
Yes, there are a number of potential improvements that can be made,
These days,
renderd
does not terminate cleanly (the signal handlers didn't work for me, they seem to be a mechanism for the render threads to indicate a catastrophy (e.g. style/font loading failed)). Wouldn't the proper way be to respond to a signal? I'd imagine that's what e.g.systemd
would send to the process e.g. during a shutdown/reboot and nowadays it just gets akill
signal after some time of not responding to it...? How important would that be as a feature?
You are correct that this did not exist for a very long time (and still does not in the latest release v0.7.2
), it was, however, added recently into the default branch here.
You are correct that this did not exist for a very long time (and still does not in the latest release
v0.7.2
), it was, however, added recently into the default branch here.
😳 ... and it's also present in my code base! I don't know what I was looking at when I wrote the comment above; consider it retracted. Thanks for pointing it out.
@hummeltech I rewrote the shutdown code a little and hooked it into the signal handler. I have only done minimal testing on this, but feel free to take a look at "the good parts".
I do get a message from jemalloc
about leaks now, but I don't recognize any of the symbols:
...
** INFO: 13:19:04.695: Render thread 7/8 exited.
** INFO: 13:19:04.695: Destroying request queue...
** INFO: 13:19:04.695: Request queue destroyed.
<jemalloc>: Leak approximation summary: ~2488287 bytes, ~5 objects, >= 3 contexts
<jemalloc>: Run jeprof on dump output for leak detail
❯ jeprof --text src/renderd jeprof.11418.0.f.heap
Using local file src/renderd.
Using local file jeprof.11418.0.f.heap.
Total: 2.6 MB
1.6 61.4% 61.4% 2.1 80.7% std::vector::_M_realloc_insert
0.5 19.3% 80.7% 0.5 19.3% get_self_exe_path
0.5 19.3% 100.0% 0.5 19.3% __trans_list_add
0.0 0.0% 100.0% 0.5 19.3% Option::Option
0.0 0.0% 100.0% 2.1 80.7% SubProcessTimed::~SubProcessTimed
0.0 0.0% 100.0% 2.6 100.0% _dl_init
0.0 0.0% 100.0% 2.6 100.0% _dl_start_user
0.0 0.0% 100.0% 2.1 80.7% build_options
0.0 0.0% 100.0% 2.6 100.0% call_init
0.0 0.0% 100.0% 0.5 19.3% flnl_result_alloc
0.0 0.0% 100.0% 2.1 80.7% fmt::v7::detail::error_handler::on_error
The above output is from spinning up renderd
and immediately asking it to exit (no tile rendering). The output below has force-rendered all tiles at zoom level 4 and 6 and shows the same leaks: I think the rendering code itself seems to be leak-free if allowed to terminate.
** INFO: 14:28:32.856: Render thread 7/8 exited.
** INFO: 14:28:32.856: Destroying request queue...
** INFO: 14:28:32.856: Request queue destroyed.
<jemalloc>: Leak approximation summary: ~2488287 bytes, ~5 objects, >= 3 contexts
<jemalloc>: Run jeprof on dump output for leak detail
❯ jeprof --text src/renderd jeprof.17275.0.f.heap
Using local file src/renderd.
Using local file jeprof.17275.0.f.heap.
Total: 2.6 MB
1.6 61.4% 61.4% 2.1 80.7% std::vector::_M_realloc_insert
0.5 19.3% 80.7% 0.5 19.3% get_self_exe_path
0.5 19.3% 100.0% 0.5 19.3% __trans_list_add
0.0 0.0% 100.0% 0.5 19.3% Option::Option
0.0 0.0% 100.0% 2.1 80.7% SubProcessTimed::~SubProcessTimed
0.0 0.0% 100.0% 2.6 100.0% _dl_init
0.0 0.0% 100.0% 2.6 100.0% _dl_start_user
0.0 0.0% 100.0% 2.1 80.7% build_options
0.0 0.0% 100.0% 2.6 100.0% call_init
0.0 0.0% 100.0% 0.5 19.3% flnl_result_alloc
0.0 0.0% 100.0% 2.1 80.7% fmt::v7::detail::error_handler::on_error
Debian bookworm: renderd used all the available memory (64 GB + 32 GB swap). I switched to jemalloc and memory usage is usually below 5 GB.
Continuation of discussion started here: https://github.com/openstreetmap/mod_tile/issues/181#issuecomment-2164503429_