openstreetmap / mod_tile

Renders map tiles with mapnik and serves them using apache
http://wiki.openstreetmap.org/wiki/Mod_tile
GNU General Public License v2.0
290 stars 191 forks source link

Examine memory usage for potential leaks #445

Open hummeltech opened 5 months ago

hummeltech commented 5 months ago

Continuation of discussion started here: https://github.com/openstreetmap/mod_tile/issues/181#issuecomment-2164503429_

hummeltech commented 5 months ago

Here's a quick and dirty visualization of the respective memory footprints. All tests are done consecutively, but plotted on the same (relative) time axis.

Conclusions so far:

  • jemalloc seems the only one that returns substantial memory to the system
  • tcmalloc seems to not ever return anything, maybe additional configuration is required?
  • glib does have a dip here and there - are those just "lucky" allocations that reduced the top-of-heap?

Timeline: 0s: start renderd - it loads the mapnik style file within ~10-15 seconds 30s: issue render request for all tiles at zoom level 6 45s-100s: database is churning like crazy, renderd threads are mostly idle 100-220s: renderd threads are using up CPU and memory 220s-250s: render_list exits, all rendering is done 300s: benchmark stops

image

@rolandbosa IIRC, some of mod_tile's dependencies are likely the cause of the high memory consumption displayed in this chart. I will have to more closely look at it again, but thank you very much for compiling this!

hummeltech commented 5 months ago

@rolandbosa, thanks for the work, I made a few changes to your branch here (as well as adding mimalloc into the mix), I've tried out the script and am seeing this: image

hummeltech commented 5 months ago

Alright @rolandbosa, I have merged in your MALLOC_LIB (#454) and strtok_r (#455) commits into mod_tile and rebased the above-mentioned branch.

Thanks for the help!

rolandbosa commented 5 months ago

@hummeltech Oh, that merge was a surprise... I consider my branch still work in progress, and it had quite some junk in it. But now I see you just took the "good" parts and merged those. Thank you! I think the jemalloc choice is a step in the right direction!

I don't know if I've mentioned this already, but I managed ("hacked" - definitely don't merge that ;-) ) to get renderd to quit and dump a leak profile. But I'm having problems analyzing the profile in depth and won't have time for a while to work on that. But it looks though like there are no apparent leaks during the renderd::mapnik calls, it's just a lot of caching that sticks around.

So I started thinking of implementing some configurable recycling of the render threads, which might help to cut down on the overall memory usage; of course, at the cost of needing to reload the map style and fonts etc. Maybe with some tweak-able parameters to spread the cost of spinning up a render thread over N tiles, or M seconds, or both. I hope to get back to this when time allows...

hummeltech commented 5 months ago

Haha, yes that's right, just the "best" parts of your use_jemalloc branch, thanks again!

I did try out your exit please command and I do think it is a good idea, but was getting a segmentation fault here. Also, I think it might go better under a renderd option if and when you get back to it. Thanks for your contributions!

rolandbosa commented 5 months ago

Hm, I can see this possibility for the fault: I tried to find a way to distinguish if a statistics thread was created or not by initializing the value of stats_thread with a "null value". Apparently this doesn't exist portably, instead one should use the value of pthread_self() and do a pthreads_equal() comparison. However, according to the man page (and also as qbert220 points out), the value is undefined upon return, so a proper implementation should also re-initialize the variable upon creation failure (in addition to the initialization before checking if a statistics threads is requested at all). Why the creation would fail is a different question though...

In either case, the statistics thread is also sitting in a hard sleep() and that should be rewritten to use a timed condition sleep: sleep for the requested time OR when a exit has been requested. But since this was just testing code to figure out any post-mortem leaks, I chose the "Can't be bothered with this detail now"-way.

These days, renderd does not terminate cleanly (the signal handlers didn't work for me, they seem to be a mechanism for the render threads to indicate a catastrophy (e.g. style/font loading failed)). Wouldn't the proper way be to respond to a signal? I'd imagine that's what e.g. systemd would send to the process e.g. during a shutdown/reboot and nowadays it just gets a kill signal after some time of not responding to it...? How important would that be as a feature?

hummeltech commented 5 months ago

Hm, I can see this possibility for the fault: I tried to find a way to distinguish if a statistics thread was created or not by initializing the value of stats_thread with a "null value". Apparently this doesn't exist portably, instead one should use the value of pthread_self() and do a pthreads_equal() comparison. However, according to the man page (and also as qbert220 points out), the value is undefined upon return, so a proper implementation should also re-initialize the variable upon creation failure (in addition to the initialization before checking if a statistics threads is requested at all). Why the creation would fail is a different question though...

Thanks for the explanation, that sounds like it might be a good addition. The stats thread is optional, so that is at least one reason it would not be created, any other errors should lead to an exit.

In either case, the statistics thread is also sitting in a hard sleep() and that should be rewritten to use a timed condition sleep: sleep for the requested time OR when a exit has been requested. But since this was just testing code to figure out any post-mortem leaks, I chose the "Can't be bothered with this detail now"-way.

Yes, there are a number of potential improvements that can be made,

These days, renderd does not terminate cleanly (the signal handlers didn't work for me, they seem to be a mechanism for the render threads to indicate a catastrophy (e.g. style/font loading failed)). Wouldn't the proper way be to respond to a signal? I'd imagine that's what e.g. systemd would send to the process e.g. during a shutdown/reboot and nowadays it just gets a kill signal after some time of not responding to it...? How important would that be as a feature?

You are correct that this did not exist for a very long time (and still does not in the latest release v0.7.2), it was, however, added recently into the default branch here.

rolandbosa commented 5 months ago

You are correct that this did not exist for a very long time (and still does not in the latest release v0.7.2), it was, however, added recently into the default branch here.

😳 ... and it's also present in my code base! I don't know what I was looking at when I wrote the comment above; consider it retracted. Thanks for pointing it out.

rolandbosa commented 5 months ago

@hummeltech I rewrote the shutdown code a little and hooked it into the signal handler. I have only done minimal testing on this, but feel free to take a look at "the good parts".

I do get a message from jemalloc about leaks now, but I don't recognize any of the symbols:

...
** INFO: 13:19:04.695: Render thread 7/8 exited.
** INFO: 13:19:04.695: Destroying request queue...
** INFO: 13:19:04.695: Request queue destroyed.
<jemalloc>: Leak approximation summary: ~2488287 bytes, ~5 objects, >= 3 contexts
<jemalloc>: Run jeprof on dump output for leak detail

❯ jeprof --text src/renderd jeprof.11418.0.f.heap 
Using local file src/renderd.
Using local file jeprof.11418.0.f.heap.
Total: 2.6 MB
     1.6  61.4%  61.4%      2.1  80.7% std::vector::_M_realloc_insert
     0.5  19.3%  80.7%      0.5  19.3% get_self_exe_path
     0.5  19.3% 100.0%      0.5  19.3% __trans_list_add
     0.0   0.0% 100.0%      0.5  19.3% Option::Option
     0.0   0.0% 100.0%      2.1  80.7% SubProcessTimed::~SubProcessTimed
     0.0   0.0% 100.0%      2.6 100.0% _dl_init
     0.0   0.0% 100.0%      2.6 100.0% _dl_start_user
     0.0   0.0% 100.0%      2.1  80.7% build_options
     0.0   0.0% 100.0%      2.6 100.0% call_init
     0.0   0.0% 100.0%      0.5  19.3% flnl_result_alloc
     0.0   0.0% 100.0%      2.1  80.7% fmt::v7::detail::error_handler::on_error

The above output is from spinning up renderd and immediately asking it to exit (no tile rendering). The output below has force-rendered all tiles at zoom level 4 and 6 and shows the same leaks: I think the rendering code itself seems to be leak-free if allowed to terminate.

** INFO: 14:28:32.856: Render thread 7/8 exited.
** INFO: 14:28:32.856: Destroying request queue...
** INFO: 14:28:32.856: Request queue destroyed.
<jemalloc>: Leak approximation summary: ~2488287 bytes, ~5 objects, >= 3 contexts
<jemalloc>: Run jeprof on dump output for leak detail

❯ jeprof --text src/renderd jeprof.17275.0.f.heap 
Using local file src/renderd.
Using local file jeprof.17275.0.f.heap.
Total: 2.6 MB
     1.6  61.4%  61.4%      2.1  80.7% std::vector::_M_realloc_insert
     0.5  19.3%  80.7%      0.5  19.3% get_self_exe_path
     0.5  19.3% 100.0%      0.5  19.3% __trans_list_add
     0.0   0.0% 100.0%      0.5  19.3% Option::Option
     0.0   0.0% 100.0%      2.1  80.7% SubProcessTimed::~SubProcessTimed
     0.0   0.0% 100.0%      2.6 100.0% _dl_init
     0.0   0.0% 100.0%      2.6 100.0% _dl_start_user
     0.0   0.0% 100.0%      2.1  80.7% build_options
     0.0   0.0% 100.0%      2.6 100.0% call_init
     0.0   0.0% 100.0%      0.5  19.3% flnl_result_alloc
     0.0   0.0% 100.0%      2.1  80.7% fmt::v7::detail::error_handler::on_error
frafra commented 1 day ago

Debian bookworm: renderd used all the available memory (64 GB + 32 GB swap). I switched to jemalloc and memory usage is usually below 5 GB.