tdlight-team / tdlight-telegram-bot-api

The TDLight Telegram Bot API is an actively enhanced fork of the original Bot API, featuring experimental user support, proxies, unlimited files size, and more.
https://t.me/TDLight
Boost Software License 1.0
128 stars 27 forks source link

Force mmap for allocations #47

Closed cavallium closed 1 year ago

cavallium commented 3 years ago

When running for a long time, especially when multiple instances are often restarted, TDLib seems to have a problem with allocations. By default, glibc malloc allocates small memory allocations using sbrk() syscall, and medium to big allocations using mmap().

Managing a lot of small allocations using sbrk is problematic, since no memory can be freed except for a range on the top of the data addresses, and small allocations fragment everything underneath the freeable zone, making the allocated region bigger and bigger without the possibility to free anything. Due to this and many other problems, sbrk and brk have been deprecated on many unix-like systems, like bsd and its forks, and the latest posix specifications removed sbrk and brk entirely (they deprecated this two calls already in the 1997 standard!).

When a bot starts, it allocates a lot of small objects (tens of millions of updates and related data), until the queue gets processed and emptied. With brk this means that when a bot starts, a memory spike occurs, and tdlib will use X GB of memory forever until closed. Closing sessions doesn't fix this problem since some objects will still remain on the top of the addresses, making the entire allocated memory unfreeable.

Fortunately this can be fixed on systems that use glibc (linux) by adding the following line:

mallopt(M_MMAP_THRESHOLD, 0);

A threshold of 0 will force every allocation to be made with mmap, fixing the problem permanently. With mmap when a bot closes, or a queue decreases, the memory will be returned to the OS.

parsapoorsh commented 1 year ago

telegram-bot-api consumes a lot of RAM and does not release it after use, which is very annoying. To bypass this problem, I restart the program with a cron job, which is very dirty work. This pull request is for 2 years ago, but the problem has not been solved by anyone or a fork. This is very useful, please add this soon.
@andrew-ld @davtur19 @ErnyTech @nikisalli

andrew-ld commented 1 year ago

@parsapoorsh this is not a problem, it's how tdlib and telegram bot api cache works, if you have to reboot every day it means you underestimated the RAM needed for your bot.

parsapoorsh commented 1 year ago

@andrew-ld So is there any way to limit or reduce the amount of cache? Because when it is running, for example, all 10 bots work regularly and perfectly, but after a few hours, the amount of RAM has increased unreasonably.

andrew-ld commented 1 year ago

cache increases to infinity, the way tdlib is designed more and more RAM will be used with each new message received, there is no way to limit the cache nor to empty and deallocate it.

andrew-ld commented 1 year ago

https://t.me/tdlibchat/40196

the developer working for telegram in maintaining the bot api advises that servers with hundreds of gb of ram are needed for certain bots.

a3kov commented 1 year ago

Is it possible to write the cache to files and use OS file cache ? It will evict the cache to free up memory automatically. The library should only be using RAM for network buffers and data currently in use.

cavallium commented 1 year ago

Please, stop taking about this here, you should ask about tdlib design to tdlib, not us