Closed NAThompson closed 1 year ago
Initial analysis says this is functools.lru_cache
holding onto references. Since __del__
won't reliably get called to keep memory usage low (certainly it's not in my testing), would a method such as Document.release()
to force a free up of memory through cleaning the cache be useful for you? I did look at a weakref
version of caching, but it's horribly inefficient making performance 2-3x slower for everyone. The alternative would be to explicitly cache in each class method without using lru_cache
.
@masaccio : I think such a method would be useful. However I worry you'll get this same bug report many times and have to explain over and over again!
That said, my interest is more in ecosytem improvement than in any particular solution. So a more interesting question to me: Do you think you have enough evidence to submit a bug against functool.lru_cache
?
I decided to just rewrite functools.lru_cache
to give me control over the cache references. This code:
from numbers_parser import Document
from psutil import Process
process = Process()
rss_base = process.memory_info().rss
for i in range(32):
doc = Document("test.numbers")
print("memory:", int((process.memory_info().rss - rss_base) / (1024 * 1024)), "MB")
stabilises around 200MB of usage on my Mac, but calling gc.collect()
in the loop doesn't change that much so there may still be some leaks, but certainly far less than before.
That said:
from pympler.tracker import SummaryTracker
from numbers_parser import Document
def func():
doc = Document("test.numbers")
tracker = SummaryTracker()
for i in range(5):
func()
tracker.print_diff()
Indicates that there's not many references left around:
types | # objects | total size
======================== | =========== | ============
list | 4685 | 406.86 KB
str | 4693 | 332.30 KB
int | 954 | 26.11 KB
dict | 15 | 21.27 KB
code | 5 | 9.64 KB
type | 5 | 6.80 KB
abc.ABCMeta | 0 | 944 B
tuple | 13 | 712 B
weakref.ReferenceType | 5 | 360 B
function (decode) | 2 | 288 B
function (encode) | 2 | 288 B
method | 3 | 192 B
function (store_info) | 1 | 144 B
function (getregentry) | 1 | 144 B
encodings.cp437.Codec | 2 | 96 B
@masaccio : I have verified that on 4.4.4, memory use increases monotonically, yet on 4.4.5, the memory use stays bounded.
Thanks so much!
Describe the bug
Memory consumption is excessive.
To Reproduce
Python file:
Expected behavior
Less memory consumption.
Attachments
(attached-had to rename it with
.txt
extension to get it past github.)test.numbers.txt
Additional information