python-attrs / cattrs

Composable custom class converters for attrs, dataclasses and friends.
https://catt.rs
MIT License
826 stars 113 forks source link

memory leak of GenConverter with detailed_validation = True #290

Closed Bryant-Yang closed 10 months ago

Bryant-Yang commented 2 years ago

while True:

creating GenConverter instance with all default params.

# registering some simple structure hook, 
# loading some data, 

the memory usage of process keep increasing.

If set detailed_validation = False, there is no memory leak. I'm not sure if it cost by line cache.

Tinche commented 2 years ago

Interesting, the hooks should be generated only once in any case. Would be curious if you can get a small repro?

Tinche commented 2 years ago

I tried getting a repro:

from attrs import define

from cattrs import GenConverter

c = GenConverter()

@define
class Test:
    a: int
    b: int
    c: int

while True:
    c.structure({"a": 1, "b": 2, "c": 3}, Test)

but it doesn't leak.

Bryant-Yang commented 2 years ago

This issue is found in one of my test server which has little memory, in a scheduled task loop which cause memory usage too high, had to reboot after running 2 days.

I have made my converter usage as singleton. It's just not a clear constraint that this converter object could not be gc collected in some situations.

I can give a simple test as below:

import gc
from datetime import datetime
from typing import List

import cattr
import time
from attr import attrs

def create_converter(detailed_validation):
    converter = cattr.GenConverter(detailed_validation=detailed_validation)
    converter.register_structure_hook(datetime,
                                      lambda val, _: datetime.strptime(val, '%Y-%m-%d %H:%M:%S')
                                      if val is not None else None)
    return converter

@attrs(auto_attribs=True)
class TestData:
    @attrs(auto_attribs=True)
    class InfoList:
        name: str

    name: str
    dt: datetime
    info_list: List[InfoList]

def test(detailed_validation):
    converter = create_converter(detailed_validation)
    data = {'name': "testAAA", "dt": datetime.now().strftime('%Y-%m-%d %H:%M:%S'), "info_list": [{'name': "testAAA"}] * 100}
    for _ in range(1000):
        converter.structure(data, TestData)

if __name__ == '__main__':
    import os
    import psutil
    process = psutil.Process(os.getpid())
    while True:
        test(detailed_validation=True)  # memory usage increasing
        # test(detailed_validation=False)   # memory usage not increasing
        gc.collect()
        time.sleep(.1)
        print(f'mem info: {process.memory_info().rss}')

Interesting, the hooks should be generated only once in any case. Would be curious if you can get a small repro?

Tinche commented 2 years ago

Ah, you seem to be creating a converter in a loop. I strongly advise not doing that, not only with cattrs, but also with libraries like attrs and dataclasses. All of these libraries have heavy initialization steps (including code generation) to make them faster afterwards; if you're constantly creating new converters you're paying the price and getting no benefit.

This is probably a leak somewhere in linecache, yeah. The hooks are generated once per converter. I can take a look when I have spare cycles, but due to what I said in the first paragraph I think this is lower prio.

Bryant-Yang commented 2 years ago

yes,don't create instance repeatedly if not necessary,agree that.

Tinche commented 10 months ago

This should actually already be fixed by #464