python / cpython

The Python programming language
https://www.python.org
Other
63.32k stars 30.31k forks source link

garbage collector blocks and takes worst-case linear time wrt number of objects #49044

Closed e17ae51f-bf3c-4ea4-be2c-67503164f70f closed 15 years ago

e17ae51f-bf3c-4ea4-be2c-67503164f70f commented 15 years ago
BPO 4794
Nosy @loewis, @benjaminp
Files
  • gctimings.zip: garbage collection timings
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'performance'] title = 'garbage collector blocks and takes worst-case linear time wrt number of objects' updated_at = user = 'https://bugs.python.org/darrenr' ``` bugs.python.org fields: ```python activity = actor = 'darrenr' assignee = 'none' closed = True closed_date = closer = 'benjamin.peterson' components = ['Interpreter Core'] creation = creator = 'darrenr' dependencies = [] files = ['12510'] hgrepos = [] issue_num = 4794 keywords = [] message_count = 9.0 messages = ['78646', '78649', '78662', '78825', '78926', '78935', '79186', '79189', '79277'] nosy_count = 4.0 nosy_names = ['loewis', 'benjamin.peterson', 'LambertDW', 'darrenr'] pr_nums = [] priority = 'normal' resolution = 'duplicate' stage = None status = 'closed' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue4794' versions = ['Python 2.6', 'Python 2.4', 'Python 3.0'] ```

    e17ae51f-bf3c-4ea4-be2c-67503164f70f commented 15 years ago

    Python's garbage collector holds GIL during collection and doesn't provide any method of interruption or concurrency with other Python threads within a single Python VM. This can be a problem for realtime applications. The worst-case performance of the garbage collector takes linear time with respect to the number of Python objects that could potentially be involved in a garbage cycle. I've attached timings taken on a Core 2 Quad 2.4 GHz (WinXP Pro, 3GB RAM), with ever-increasing numbers of objects. The gc at worst takes upwards of 3 seconds before the process runs out of memory.

    If gc periodically released the GIL it would allow it to be put in a separate thread, but as it stands it blocks the Python VM for periods of time that are too long for realtime interactive applications. Alternatively a gc that is incremental by default would eliminate the need for a second thread.

    benjaminp commented 15 years ago

    The garbage collector will never be able to run in a second thread because it manipulates Python objects, which the GIL is supposed to protect!

    As for non-linear complexity, see bpo-4688 and bpo-4074 for some attempts to sooth this problem over.

    e17ae51f-bf3c-4ea4-be2c-67503164f70f commented 15 years ago

    A 'stop-the-world' garbage collector that periodically released the GIL could be run in a second thread, allowing the main thread to break in and do some processing. However the nature of a stop-the-world collector means that it probably would not easily be able to deal with changes made by other threads in the middle of the collect.

    My concern is that the Python process blocks and is unresponsive due to garbage collection for periods of time that are not acceptable for realtime interactive applications. Are there any plans to add an incremental collector to Python?

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 15 years ago

    Hard real-time applications written in Python should not rely on the cyclic garbage collector. They should call gc.disable at startup, and completely rely on reference counting for releasing memory. Doing so might require rewrites to the application, explicitly breaking cycles so that reference counting can release them.

    Even with cyclic gc disabled, applications worried about meeting hard deadlines need to look even more into memory allocation and deallocation; e.g. releasing a single object may cause a chained release of many objects, which can affect worst case execution times. There are more issues to consider (which are all out of scope of the bug tracker).

    e17ae51f-bf3c-4ea4-be2c-67503164f70f commented 15 years ago

    OK cool, that's the development strategy we've already adopted. Is this limitation of Python's garbage collector in relation to real-time applications documented anywhere?

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 15 years ago

    OK cool, that's the development strategy we've already adopted. Is this limitation of Python's garbage collector in relation to real-time applications documented anywhere?

    Why do you ask? (this is OT for the bug tracker) It's not in the Python documentation. However, any good book on real-time systems will tell you that garbage collection is a problem.

    e17ae51f-bf3c-4ea4-be2c-67503164f70f commented 15 years ago

    I ask because in my opinion a three-second pause on a modern machine is significant for any program with any sort of interactivity--significant enough to warrant a warning in the documentation. Python is a great language and I think it deserves an incremental implementation of garbage collection.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 15 years ago

    Python is a great language and I think it deserves an incremental implementation of garbage collection.

    Python's cyclic garbage collector is incremental. If you can provide a specific patch to replace it with something "better", please submit it as a separate issue to the tracker. If you can hire somebody to implement such a thing, go right ahead. Otherwise, I think there is little that can be done. Python gets all of its contributions from volunteers.

    e17ae51f-bf3c-4ea4-be2c-67503164f70f commented 15 years ago

    Regardless of the type of algorithm used by the garbage collector that currently exists in Python, its worst-case performance is undesirable. I have some interest in implementing a garbage collector for Python with improved performance characteristics but don't really have the time for it. Hopefully someone does. In any case thanks for hearing me out.