Open Rongronggg9 opened 3 years ago
I got more data in production.
I have two instances of https://github.com/Rongronggg9/RSS-to-Telegram-Bot on the same VPS. One with ~4000 feeds, another one with ~3000 feeds. The bot will check the updates of feeds frequently. I noticed that the relation between the number of feeds and the amount of memory leakage is a logarithm relation. And parsing the same feed (no matter if it keeps the same or is updated) multiple times leaks less than parsing different feeds once, but when the same feed has been parsed fairly high times, the memory leakage will hardly increase. That is to say, the relation between the number of times of parsing and the amount of memory leakage is also a logarithm relation.
I guess the leaked objects can somehow be reused? If that's true, it will be a helpful clue to figuring out the cause of memory leakage.
Related: https://github.com/kurtmckee/feedparser/pull/302#issuecomment-1133549551
Hi, coming here from your comment on #302.
I ran a few tests where I called feedparser.parse() in a loop and measured memory usage (details below). I tried two feeds, one 2M and one 50K, both loaded from disk; I did this both on macOS and on Ubuntu.
The results are as you describe, the max RSS increases in what looks like a logarithmic curve; that is, after enough iterations (10-100), the max RSS remains almost horizontal/stable.
However, I am not convinced this is a memory leak in feedparser.
Rather, I think it's a side-effect of how Python memory allocation works. Specifically, Python never releases allocated memory back to the operating system (1, 2, 3), but keeps it around and reuses it. (Because of this, running gc.collect() will never decrease RSS.)
I assume the initial sharper memory increase is due to fragmentation (even if there's enough memory available, it's not in a contiguous chunk, so the allocator has to allocate additional memory); as more and more memory is allocated and then released (in the pool), it becomes easier to find a contiguous chunk.
It makes sense for #302 to make max RSS stabilize faster, since it reduces the number of allocations – and more importantly, the number of big (whole feed) allocations (which reduces the impact of fragmentation).
It might be possible to confirm this 100% by measuring the used memory as seen by the Python allocator, instead of max RSS.
Script:
import sys, resource
import feedparser
print(" loop maxrss")
for i in range(10 ** 3 + 1):
with open(sys.argv[1], 'rb') as file:
feedparser.parse(file)
maxrss = (
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
/ 2 ** (20 if sys.platform == 'darwin' else 10)
)
if (i <= 10) or (i <= 100 and i % 10 == 0) or (i <= 1000 and i % 100 == 0):
print(f"{i:>8} {maxrss:>8.3f}")
Hi, @lemon24. Thanks for your share.
I can confirm that your statement "I am not convinced this is a memory leak in feedparser" is true. BeautifulSoup(something, 'html.parser')
(html.parser
is written in pure Python) "leaks" in the same pattern as feedparser.parse(something)
, while BeautifulSoup(something, 'lxml')
(lxml
is written in C) "leaks" nothing. (Would feedparser
adopting lxml
as a parser backend help reduce the memory usage? Probably, lol.)
However, after confirming the previous statement, I did a deep dive. I believe that your statement "Python never releases allocated memory back to the operating system, but keeps it around and reuses it" is incorrect.
Python does release unused memory, but the prerequisite is that it can. It is fragmentation that breaks this prerequisite and is a glibc malloc
issue instead of a Python-specific issue.
By default, <128KB malloc
uses sbrk
instead of mmap
to allocate memory. Fragment on high address, which was originally allocated by sbrk
, prevents memory compaction from releasing low-address-free memory. However, memory allocated by mmap
is managed by the OS and comes without such a disadvantage. What's worse, the threshold is dynamic nowadays and can be increased at runtime (up to 4*1024*1024*sizeof(long)
on 64-bit systems!). The default malloc
policy is actually a space-time tradeoff since the mmap
syscall is costly. That's the real reason for the "leakage" and explains why CPython on Windows is not affected. Also explains why the feeds loaded into memory as strings can be released - most of them are larger than 128KB!
In conclusion, your PR (#302) does help reduce the "leakage", but fairly limited. My final solution is shown below.
Prohibiting the usage of sbrk
by setting M_MMAP_THRESHOLD
to 0
eliminates the "leakage". It is just an experiment, do not set M_MMAP_THRESHOLD
to a fairly low value in production or you will face performance issues.
As a solution in production, 16384
(16KB) is a nice value for those concerned about the issue. Even the default initial value 131072
(128KB) helps a lot since setting the value of M_MMAP_THRESHOLD
effectively disables its dynamic increment.
ctypes
+import ctypes
+libc = ctypes.cell.LoadLibrary("libc.so.6")
+M_MMAP_THRESHOLD = -3
+libc.mallopt(M_MMAP_THRESHOLD, 0) # effectively prohibit `sbrk`
import gc
import os
...
2022-05-27-01:35:17:INFO - Started! Memory usage: 54.39 MiB
2022-05-27-01:35:17:INFO - Feeds loaded into memory! Memory usage: 80.66 MiB
2022-05-27-01:35:17:INFO - would_leak_1 started! Memory usage: 80.66 MiB
2022-05-27-01:35:44:INFO - would_leak_1 finished! Memory usage: 84.94 MiB
2022-05-27-01:35:44:INFO - would_leak_1 garbage collected! Memory usage: 84.94 MiB
2022-05-27-01:35:44:INFO - would_leak_2 started! Memory usage: 84.94 MiB
2022-05-27-01:36:13:INFO - would_leak_2 finished! Memory usage: 85.52 MiB
2022-05-27-01:36:13:INFO - would_leak_2 garbage collected! Memory usage: 85.52 MiB
2022-05-27-01:36:13:INFO - Done! Memory usage: 85.52 MiB
2022-05-27-01:36:13:INFO - Feeds in memory cleared! Memory usage: 59.30 MiB
Note: In this way, even the initialization of Python is affected, so setting the value to
0
consumes more memory to initialize Python. Do not setMALLOC_MMAP_THRESHOLD_
less than8192
in production, this ensures that the memory consumption will not be larger than a vanilla execution and the performance is mostly not affected.
$ MALLOC_MMAP_THRESHOLD_=0 python script.py
2022-05-27-01:52:03:INFO - Started! Memory usage: 72.52 MiB
2022-05-27-01:52:03:INFO - Feeds loaded into memory! Memory usage: 98.79 MiB
2022-05-27-01:52:03:INFO - would_leak_1 started! Memory usage: 98.79 MiB
2022-05-27-01:52:39:INFO - would_leak_1 finished! Memory usage: 102.91 MiB
2022-05-27-01:52:39:INFO - would_leak_1 garbage collected! Memory usage: 102.91 MiB
2022-05-27-01:52:39:INFO - would_leak_2 started! Memory usage: 102.91 MiB
2022-05-27-01:53:08:INFO - would_leak_2 finished! Memory usage: 103.58 MiB
2022-05-27-01:53:08:INFO - would_leak_2 garbage collected! Memory usage: 103.56 MiB
2022-05-27-01:53:08:INFO - Done! Memory usage: 103.56 MiB
2022-05-27-01:53:08:INFO - Feeds in memory cleared! Memory usage: 77.35 MiB
Ref: https://stackoverflow.com/questions/68225871/python3-give-unused-interpreter-memory-back-to-the-os https://stackoverflow.com/questions/15350477/memory-leak-when-using-strings-128kb-in-python https://stackoverflow.com/questions/35660899/reduce-memory-fragmentation-with-malloc-mmap-threshold-and-malloc-mmap-max https://man7.org/linux/man-pages/man3/mallopt.3.html
A better workaround for multithread programs is to replace the ptmalloc
from glibc
with jemalloc
.
https://github.com/Rongronggg9/RSS-to-Telegram-Bot/commit/ae69f738cab53f21f4587272cfce5f22915182a6
https://github.com/Rongronggg9/RSS-to-Telegram-Bot/commit/eb07fa91f7ba9f49584f9effea1488bd0142d7b4
jemalloc
shows impressive performance while maintaining a high memory recycling rate on multithread programs.
I've changed the title of the issue and would like to keep it open to be a guide for those developers facing the same issue. It would be better if the issue could be documented in the docs.
My conclusion is that to "solve" the issue at the feedparser
side, adopting lxml
might be the best and easiest solution. For downstream developers, the two workarounds I've described are easy to adopt.
Code to reproduce
feeds.tar.gz
My tests
feedparser 6.0.8
Debian GNU/Linux 11 (bullseye) on WSL (CPython 3.9.2) - Leaked!
neofetch
``` _,met$$$$$gg. ***@*** ,g$$$$$$$$$$$$$$$P. ---------------------- ,g$$P" """Y$$.". OS: Debian GNU/Linux 11 (bullseye) on Windows 10 x86_64 ,$$P' `$$$. Kernel: 5.10.43.3-microsoft-standard-WSL2 ',$$P ,ggs. `$$b: Uptime: 3 hours, 13 mins `d$$' ,$P"' . $$$ Packages: 1939 (dpkg) $$P d$' , $$P Shell: zsh 5.8 $$: $$. - ,d$$' Theme: Breeze [GTK2/3] $$; Y$b._ _,d$P' Icons: breeze [GTK2/3] Y$$. `.`"Y$$$$P"' Terminal: Windows Terminal `$$b "-.__ CPU: Intel i7-10510U (8) @ 2.304GHz `Y$$ GPU: f549:00:00.0 Microsoft Corporation Device 008e `Y$$. Memory: 487MiB / 1917MiB `$$b. `Y$$b. `"Y$b._ `""" ```Debian GNU/Linux 11 (bullseye) on Azure b1s (CPython 3.9.2) - Leaked!
neofetch
``` _,met$$$$$gg. ***@*** ,g$$$$$$$$$$$$$$$P. ------- ,g$$P" """Y$$.". OS: Debian GNU/Linux 11 (bullseye) x86_64 ,$$P' `$$$. Host: Virtual Machine Hyper-V UEFI Release v4.1 ',$$P ,ggs. `$$b: Kernel: 5.10.0-8-cloud-amd64 `d$$' ,$P"' . $$$ Uptime: 4 days, 6 hours, 9 mins $$P d$' , $$P Packages: 681 (dpkg) $$: $$. - ,d$$' Shell: bash 5.1.4 $$; Y$b._ _,d$P' Terminal: /dev/pts/2 Y$$. `.`"Y$$$$P"' CPU: Intel Xeon E5-2673 v4 (1) @ 2.294GHz `$$b "-.__ Memory: 563MiB / 913MiB `Y$$ `Y$$. `$$b. `Y$$b. `"Y$b._ `""" ```AOSC OS aarch64 (CPython 3.8.6) - Leaked!
neofetch
``` .:+syhhhhys+:. root@tmp-8d740a05 .ohNMMMMMMMMMMMMMMNho. ----------------- `+mMMMMMMMMMMmdmNMMMMMMMMm+` OS: AOSC OS aarch64 +NMMMMMMMMMMMM/ `./smMMMMMN+ Host: Pine64 RockPro64 v2.0 .mMMMMMMMMMMMMMMo -yMMMMMm. Kernel: 5.12.13-aosc-rk64 :NMMMMMMMMMMMMMMMs .hMMMMN: Uptime: 61 days, 17 hours, 31 mins .NMMMMhmMMMMMMMMMMm+/- oMMMMN. Packages: 441 (dpkg) dMMMMs ./ymMMMMMMMMMMNy. sMMMMd Shell: bash 5.1.8 -MMMMN` oMMMMMMMMMMMN: `NMMMM- CPU: (6) @ 1.416GHz /MMMMh NMMMMMMMMMMMMm hMMMM/ Memory: 216MiB / 3868MiB /MMMMh NMMMMMMMMMMMMm hMMMM/ -MMMMN` :MMMMMMMMMMMMy. `NMMMM- dMMMMs .yNMMMMMMMMMMMNy/. sMMMMd .NMMMMo -/+sMMMMMMMMMMMmMMMMN. :NMMMMh. .MMMMMMMMMMMMMMMN: .mMMMMMy- NMMMMMMMMMMMMMm. +NMMMMMms/.` mMMMMMMMMMMMN+ `+mMMMMMMMMNmddMMMMMMMMMMm+` .ohNMMMMMMMMMMMMMMNho. .:+syhhhhys+:. ```Armbian bullseye (21.08.2) aarch64 (CPython 3.9.2) - Leaked!
neofetch
``` ***@*** ---------------- █ █ █ █ █ █ █ █ █ █ █ OS: Armbian bullseye (21.08.2) aarch64 ███████████████████████ Host: Pine H64 model B ▄▄██ ██▄▄ Kernel: 5.10.60-sunxi64 ▄▄██ ███████████ ██▄▄ Uptime: 1 hour, 56 mins ▄▄██ ██ ██ ██▄▄ Packages: 1098 (dpkg) ▄▄██ ██ ██ ██▄▄ Shell: zsh 5.8 ▄▄██ ██ ██ ██▄▄ Terminal: /dev/pts/0 ▄▄██ █████████████ ██▄▄ CPU: sun50iw1p1 (4) @ 1.800GHz ▄▄██ ██ ██ ██▄▄ Memory: 817MiB / 1989MiB ▄▄██ ██ ██ ██▄▄ ▄▄██ ██ ██ ██▄▄ ▄▄██ ██▄▄ ███████████████████████ █ █ █ █ █ █ █ █ █ █ █ ```Windows 11 22000.194 (CPython 3.9.2) - Just leaked little, which can be ignored.
neofetch
``` ,.=:!!t3Z3z., ***@*** :tt:::tt333EE3 ---------------------- Et:::ztt33EEEL @Ee., .., OS: Windows 11 x86_64 ;tt:::tt333EE7 ;EEEEEEttttt33# Host: *** :Et:::zt333EEQ. $EEEEEttttt33QL Kernel: 10.0.22000 it::::tt333EEF @EEEEEEttttt33F Uptime: 9 hours, 26 mins ;3=*^```"*4EEV :EEEEEEttttt33@. Packages: 3 (scoop) ,.=::::!t=., ` @EEEEEEtttz33QF Shell: bash 4.4.23 ;::::::::zt33) "4EEEtttji3P* Resolution: 1920x1080 :t::::::::tt33.:Z3z.. `` ,..g. DE: Aero i::::::::zt33F AEEEtttt::::ztF WM: Explorer ;:::::::::t33V ;EEEttttt::::t3 WM Theme: Custom E::::::::zt33L @EEEtttt::::z3F Terminal: Windows Terminal {3=*^```"*4E3) ;EEEtttt:::::tZ` CPU: Intel i7-10510U (8) @ 2.310GHz ` :EEEEtttt::::z7 Memory: 14760MiB / 24329MiB "VEzjt:;;z>*` ```Windows 11 22000.194 (PyPy 7.3.5, Python 3.7.10) - Leaked!
Note
If I run would_leak_1 and would_leak_2 separately, their leaking behavior seems the same. However, running them sequentially at a time does make the second-run one leak less under some conditions as you see.