python / cpython

The Python programming language
https://www.python.org
Other
61.97k stars 29.8k forks source link

Allow objects to decide if they can be collected by GC #53387

Closed 84401114-8e59-4056-83cb-632106c0b648 closed 6 years ago

84401114-8e59-4056-83cb-632106c0b648 commented 14 years ago
BPO 9141
Nosy @tim-one, @loewis, @rhettinger, @amauryfa, @pitrou, @kristjanvalur, @asvetlov, @jimjjewett, @serhiy-storchaka, @phmc
Files
  • gc_collectable.patch
  • ob_is_gc.patch
  • ob_is_gc.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'type-feature'] title = 'Allow objects to decide if they can be collected by GC' updated_at = user = 'https://github.com/kristjanvalur' ``` bugs.python.org fields: ```python activity = actor = 'kristjan.jonsson' assignee = 'none' closed = True closed_date = closer = 'kristjan.jonsson' components = ['Interpreter Core'] creation = creator = 'kristjan.jonsson' dependencies = [] files = ['17838', '25131', '25142'] hgrepos = [] issue_num = 9141 keywords = ['patch', 'needs review'] message_count = 44.0 messages = ['109099', '109100', '109112', '109113', '109226', '109229', '109240', '109241', '109247', '109248', '109265', '109802', '109803', '109804', '109807', '109810', '110973', '110975', '157464', '157492', '157571', '157621', '157657', '157658', '157681', '157682', '157685', '157688', '157689', '157714', '157719', '157722', '157723', '158510', '158537', '158538', '158540', '158542', '158543', '158560', '158561', '318537', '318582', '318641'] nosy_count = 13.0 nosy_names = ['tim.peters', 'loewis', 'rhettinger', 'amaury.forgeotdarc', 'pitrou', 'kristjan.jonsson', 'dstanek', 'stutzbach', 'asvetlov', 'Jim.Jewett', 'serhiy.storchaka', 'pconnell', 'isoschiz'] pr_nums = [] priority = 'normal' resolution = 'out of date' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue9141' versions = ['Python 3.3'] ```

    84401114-8e59-4056-83cb-632106c0b648 commented 14 years ago

    The GC module currently relies only on the presence of a __del__ method to decide if an object can be safely collected at all. Also, there is a special case for generator objects.

    This patch allows any object to call an api during its traversal, PyObject_GCCollectable(), to indicate whether it is fit to be collected at this time, overriding any presence of a \_del__ method or not.

    This mechanism is being put in place in stackless python 2.7 because tasklets cannot always be collected depending on their runtime state, and I thought this might be a useful addition for regular python, especially since there already is such a dynamic special case for generator objects.

    pitrou commented 14 years ago

    First, what's the use case? Just backporting an internal Stackless API isn't a reasonable request. Second, this tells that there *is* a finalization function, but not what the function is. How is the GC supposed to find it? Third, the implementation looks quite suboptimal. Better define a new slot method, such as tp_needs_finalizing, or tp_get_finalizer.

    84401114-8e59-4056-83cb-632106c0b648 commented 14 years ago
    1. The use case is any C extension that may need to run non-trivial code when being deleted, but where this can not be statically known. An example in the code today is generators, and the PyGen_NeedsFinalizing() function, doing exactly this. This is an attempt to make this functionality available to any C extension.

    2. No, this gives a C extension the oppertunity to say: It's okay to delete me from GC, or it's not okay to delete me from GC. There is no need to specify any particular finalizing function, it will be invoked when the refcount goes to 0, just as it happens with __del__(), or as it happens with PyGen objects.

    3. This code is only invoked for garbage deemed collectable. As such it is not on any critical path, most gc collections don't actually find any garbage. The cost of a few extra indirect function calls is likely to drown in the memory allocator overhead when the objects are released.

    Also, this is designed to be a minimum impact patch. We really shouldn't be creating new slot methods when it can be avoided.

    pitrou commented 14 years ago
    1. The use case is any C extension that may need to run non-trivial code when being deleted, but where this can not be statically known.

    tp_del is IMO a bad place to do it. I'd recommend tp_dealloc instead, precisely so that you don't end up with uncollectable objects tied to internal OS structures or other non-trivial resources.

    (also, tp_del seems to have problems with subclassing. I don't remember the specifics)

    1. This code is only invoked for garbage deemed collectable. As such it is not on any critical path, most gc collections don't actually find any garbage. The cost of a few extra indirect function calls is likely to drown in the memory allocator overhead when the objects are released.

    Ok.

    84401114-8e59-4056-83cb-632106c0b648 commented 14 years ago

    I don't understand the tp_del / tp_dealloc argument here. This is really not the concern of gc. Gc only goes and finds unreachable objects, and if they are deemend collectable, makes them go away by clearing their references. GC only cares that during the final release of any objects in a reference cycle, no non-trivial code may run (for various technical reasons). So, objects deemed "sensitive" in this way, are left alone (and put in gc.garbage instead). GC has no other knowledge of how objects behave, if or when any finalizers are called. The only thing gc does is calling Py_DECREF (through the tp_clear slot). Everything else is up to the objects.

    The intent of this patch is for objects themselves to be able to provide better information to GC as to whether they are too "sensitive" to be cleared by GC, rather than GC relying solely on the presence of a __del__ method, or using the builtin special case knowledbe for PyGen objects.

    Whether finalizers are called from tp_del or tp_dealloc is beyond the scope of the gc module or, indeed, this patch.

    pitrou commented 14 years ago

    GC only cares that during the final release of any objects in a reference cycle, no non-trivial code may run (for various technical reasons). So, objects deemed "sensitive" in this way, are left alone (and put in gc.garbage instead).

    Which is really the problem, and why you should prefer tp_dealloc over tp_del. (unless you want gc.garbage to fill up too easily)

    84401114-8e59-4056-83cb-632106c0b648 commented 14 years ago

    I'm confused, Antoine. _I am not preferring anything. I am not modifying the way finalizers are called. I'm merely providing a way for objects to signal that they _have (or don't have) non-trivial finalizers.

    A finalizer _may_ be tp_del, but it may be any code that is called as part of tp_dealloc(). This patch allows an object to say: Calling tp_dealloc() is currently unsafe from a gc.collect() run.

    pitrou commented 14 years ago

    A finalizer _may_ be tp_del, but it may be any code that is called as part of tp_dealloc(). This patch allows an object to say: Calling tp_dealloc() is currently unsafe from a gc.collect() run.

    Well, I don't know what you mean. tp_dealloc should always be "safe" and is never checked for by the GC.

    By the way, tp_del isn't even documented, which reinforces my point that it's not really for use by third-party extension types.

    I think we are reaching the point where, if you want this to happen, you should ask for opinions on python-dev, because I really don't understand which problem you are trying to solve.

    84401114-8e59-4056-83cb-632106c0b648 commented 14 years ago

    tp_del() is generally invoked by tp_dealloc(). See for example typeobject.c:849. gc.collect() never invokes tp_del() directy, that is left for the objects themselves (as part of tp_dealloc())

    Now, gc already knows that if tp_del is present, it cannot cause break the cycle, (for various technical reasons, one of them being that tp_del would be invoked with various dependency objects already nerfed by tp_clear()).

    But this is not always enough: 1) it may be too pessimistic. Sometimes the tp_del method doesn't do anything significant, or isn't even called, depending on runtime state, so it is safe for gc to clear the object (see for example genobject.c:30). 2) Sometimes finalization behaviour is not defined by a tp_del method at all, but rather some custom code in tp_dealloc.

    All this patch does, is to generalize the mechanism already provided for genobject.c (by PyGen_NeedsFinalizing()), to any object: Any object can signal to gc that: a) it is ok to collect a cycle with me in it, or b) no, it is unsafe to do so. With this patch in place, PyGen_NeedsFinalizing() no longer needs to be a special case in gcmodule.c.

    Yes, I'll send a message to python-dev.

    pitrou commented 14 years ago

    All this patch does, is to generalize the mechanism already provided for genobject.c (by PyGen_NeedsFinalizing()), to any object: Any object can signal to gc that: a) it is ok to collect a cycle with me in it, or b) no, it is unsafe to do so. With this patch in place, PyGen_NeedsFinalizing() no longer needs to be a special case in gcmodule.c.

    Adding an API function and an additional traversal pass to the GC just for the sake of removing a special case doesn't seem reasonable to me. That's why I'm asking what specific problem you are trying to solve. (rather than simply trying to make things "nicer")

    cfc9f3e0-e33f-4ecd-9ddd-4123842d6c1e commented 14 years ago

    Kristján,

    Can you compare and contrast your approach with calling PyObject_GC_UnTrack and PyObject_GC_Track to mark the object as uncollectable/collectable?

    84401114-8e59-4056-83cb-632106c0b648 commented 14 years ago

    Hi Antoine. I am not adding another traversal pass. I am modifying the function that gets called for every object about to be collected, to call the object's traversal function. This is hardly an extra "pass" and is done only for objects that are about to be collected (i.e. only in the rare case).

    The particular case is this: I have objects (tasklets) that depending on their runtime state, need to run a finalizer or not. If not, they can be collected. If they do, then they end up on gc.garbage.

    This is exactly the same case as with generators. But instead of adding yet another special case into stackless python, I realized that it may be up to objects' runtime state whether they need to run a finalizer or not. So, I wrote a more generic way and decided to contribute it to C Python, so that a future developer, having a C extension that only sometimes needs to run a finalizer, could have a way to cleanly deal with garbage collection. This is twice now that this problem has been seen in python. The first time, a special case was coded into gcmodule. This second time, I'm offering a more generic solution.

    Now, I have a number of useful improvements and additions to C python sitting in our repository, the result of over 7 years work with Python at CCP games. I have another one sitting and waiting that adds useful functionality to gcmodule. Useful in my opinion anyway, and worth at least being kept for posterity as a patch in the tracker. But since these patches of mine seem to be met with rather limited enthusiasm, I wonder if I should be bothering.

    84401114-8e59-4056-83cb-632106c0b648 commented 14 years ago

    Hi Daniel. Your message was classified as spam, I have no idea why, but this is why I only noticed it now.

    Your suggestion is interesting, I hadn't thought of that. Yes, it is possible to use the track/untrack functions (I think), but that would mean that you would have to monitor your object for every state change and reliably detect the transition from one state to another. Being able to query the current state and let gc know: "No, I cannot be collected as I am now" is a much more robust solution from the programmers perspective.

    A further difference is this: If an object isn't tracked, it won't be collected if it is part of a cycle, but it will not be put in gc.garbage either. In effect, it will just remain in memory, unreachable, with no chance of it ever being released.

    In contrast, a generator (or in my case, a tasklet) that needs to have a finalizer run when it goes away, will end up in gc.garbage, which a dilligent application will visit once in a while in order to clean it out. I'm not sure how you would clear out a generator like that, but in Stackless python, you could do something like this once in a while: for g in gc.garbage: if isinstance(g, stackless.tasklet): g.kill() del gc.garbage[:]

    pitrou commented 14 years ago

    The particular case is this: I have objects (tasklets) that depending on their runtime state, need to run a finalizer or not. If not, they can be collected. If they do, then they end up on gc.garbage.

    But why don't you try using tp_dealloc as I suggested instead? It will allow you to run a finalizer in all cases, without anything ending up in gc.garbage. It's what we do in the new io lib.

    Also, I'm still worried that the mechanism you're proposing is a bit quirky. I hope other developers can give opinions or even suggestions.

    But since these patches of mine seem to be met with rather limited enthusiasm, I wonder if I should be bothering.

    Well, the GC is a rather delicate affair and moreover it doesn't have a dedicated maintainer. That doesn't mean your patches are not interesting, but I think everyone tries to be very careful with that area of the code.

    rhettinger commented 14 years ago

    I'm merely providing a way for objects to signal that they _have_ (or don't have) non-trivial finalizers.

    +1 This is useful and non-hackish way to communicate with GC and solves a problem that doesn't otherwise have a clean solution.

    cfc9f3e0-e33f-4ecd-9ddd-4123842d6c1e commented 14 years ago

    2010/7/9 Kristján Valur Jónsson \report@bugs.python.org\

    Your message was classified as spam, I have no idea why, but this is why I only noticed it now.

    Yes, I just noticed that tonight as well. I filed a bug on the meta-tracker in the hopes that someone can dig into the underlying cause.

    Your suggestion is interesting, I hadn't thought of that. Yes, it is possible to use the track/untrack functions (I think), but that would mean that you would have to monitor your object for every state change and reliably detect the transition from one state to another. Being able to query the current state and let gc know: "No, I cannot be collected as I am now" is a much more robust solution from the programmers perspective.

    A further difference is this: If an object isn't tracked, it won't be collected if it is part of a cycle, but it will not be put in gc.garbage either. In effect, it will just remain in memory, unreachable, with no chance of it ever being released.

    Yes, I see. I have used the track/untrack approach, but it was in a very different situation. I have a long-running C function which keeps alive a large number of objects. At the start of the function, I untrack them all as a performance optimization, so the garbage collector does not have to spend time traversing them. Before the function returns, I track them again.

    I see now why that wouldn't work for your use-case. Thank you. I like the idea of your patch to give opportunities to tell the GC that they are uncollectable on the fly.

    I'm concerned about the performance impact of making tp_traverse do double-duty, though. Calling tp_traverse for every object in a cycle will have the effect of making an extra pass on every reference from every object participating in the cycle. For example, consider a large list that's part of a cycle. If we call the list's tp_traverse to establish if it's collectible, list's tp_traverse will call visit() on every item in the list. Even though you've made visit() a do-nothing function, that's still a function call per reference. It seems a shame to do all of that work unnecessarily.

    84401114-8e59-4056-83cb-632106c0b648 commented 14 years ago

    I agree that this is not the _optimal_ solution, and that when you are collecting stuff, say a list of many items, it will cause X indirect calls to all of the things in the list.

    My counterargument is, however, that all those items will, if collected, be handed off to the final Py_DECREF with all the baggage that that entails, ultimately resulting in PyObject_Free or PyMem_Free. An extra indirect call, (only for unreachable objects, btw, which is only a small objects of all objects visited during a GC pass) should not play a significatn part in the process.

    An alternative to this extra tp_traverse() pass, would be to flag objects that report that they are or are not collectable, with a special value in gc_refs, probably a bitmask. But this would be a much more intrusive change in an algorithm that is far from simple, and so need very rigorous review and testing.

    cfc9f3e0-e33f-4ecd-9ddd-4123842d6c1e commented 14 years ago

    Yes, I see your point. Even if more expensive than strictly necessary, the cost should be swamped by other existing costs.

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    I just noticed that PyTypeObjects have a gc slot already: inquiry tp_is_gc; / For PyObject_IS_GC \/

    Now, this is used in only one place in 2.7, for type objects: return type->tp_flags & Py_TPFLAGS_HEAPTYPE;

    This is thus used to dynamically query if an object was allocated with a GC header or not, since static types cannot have this.

    Now, it would be perfectly possible to change the semantics of this function to return a bit flag, with bit 0 being the "has head" and bit 1 meaning "is collectable"

    This might simplify some stuff. Any thoughts?

    pitrou commented 12 years ago

    This might simplify some stuff. Any thoughts?

    I think you should float it on python-dev.

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    Here is a completely new patch. This approach uses the already existing tp_is_gc enquiry slot to signal garbage collection. The patch modifies the generator object to use this new mechanism. The patch keeps the old PyGen_NeedsFinalizing() API, but this can now go away, unless people think it might be used in extension modules

    (why do we always expose all those internal apis from the dll? I wonder.)

    74c4563b-ab1c-43d8-9219-30c4eca796bc commented 12 years ago

    I think the second patch is a fairly straightforward enhancement to the existing situation. It still feels a bit like an awkward half-measure, though.

    (1) If a class does have a finalizer, instances should still be able to say "Go ahead and collect me, I don't happen to be in the problematic state." (In other words, there should be a return value that means not to bother checking for the existence of a tpdel / \_del__ method.)

    (2) It should be explicit whether or not this is: (2a) just an inquiry ("If you were in a cycle, could I collect you?), (2b) a statement plus inquiry ("You are in a cycle. Can I collect you?"), or (2c) a request ("You are in a cycle, and I would like to collect you. Clean up if you can, then tell me whether you are still uncollectable.")

    (3) It may be worth adding an additional slot for safe idempotent finalizers. (Earlier suggestions called this __close, but I suppose __finalize might be more general.) Once a garbage cycle is detected, go ahead and call that slot's method before giving up and sticking the whole thing in garbage. If python classes could also fill this slot, it would look a lot like (2c).

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    New patch with explicit mutually exclusive flags and better comments.

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    Thanks for your comments Jim. They made me realize that the patch was incomplete. What is needed are two mutually exclusive flags that override the presence of the tp_del slot. This addresses your 1) point and is the case for Generators.

    I don't think 2 is important. Does the context of the call matter? It is merely a question of whether a finalizer will or will not do something trivial.

    Finalizers implemented in python can never run from garbage collection. the potential to do harm is simply too great. Which is why these extensions are purely the domain of specialized C extensions. Hence I don't think adding another slot, or allowing self-declared idempotent python finalizers to run during cleanup is a good idea.

    74c4563b-ab1c-43d8-9219-30c4eca796bc commented 12 years ago

    I don't think 2 is important. Does the context of the call matter? It is merely a question of whether a finalizer will or will not do something trivial.

    It would affect how I would write such functions. If I knew that it wouldn't be called until the object was already garbage, then I be inclined to move parts of tp_del there, and break the cycle.

    Finalizers implemented in python can never run from garbage collection. the potential to do harm is simply too great.

    __del__ methods do run, even if an object was collected by the cycle detector. And they can't do any harm that couldn't also be done by a C finalizer.

    The only change from today's situation with __del__ is that once an object is known to be cyclic garbage, it would get a chance to break the cycles itself (or, admittedly, to rescue itself) before the cycle-breaker began making arbitrary decisions or gave up and stuffed it in the uncollectable list.

    Even in your case, instead of setting a timer to clean out garbage, you could have the garbage itself notify your cleaner that it needed attention.

    cfc9f3e0-e33f-4ecd-9ddd-4123842d6c1e commented 12 years ago

    On Fri, Apr 6, 2012 at 12:51 PM, Jim Jewett \report@bugs.python.org\ wrote:

    __del__ methods do run, even if an object was collected by the cycle detector. And they can't do any harm that couldn't also be done by a C finalizer.

    No, if an object with a __del__ method is part of a cycle, it is not collected. The objects get appended to gc.garbage instead.

    See: http://docs.python.org/py3k/library/gc.html#gc.garbage

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    Right Daniel, but currently an exception is made for Generator objects. The generator can tell that the finalizer won't do anything and then allow collection regardless. It knows the finalizer and that in its state it will at most do some Py_DECREF operations.

    This patch aims to generalize this exception mechanism, and also its opposite, so that we don´t need to rely on the presence of the finalizer slot to decide. We can then remove a silly function from the public (but undocumented) API.

    Of course, you can only allow collectino of an object with a finalizer if you know for sure that it will not do anything but Py_DECREF. This is possible for generators because they are not an inheritable class. But we could never do this in general for a python class. Even running .py code from the GC collector cycle would crash everything instantly.

    cfc9f3e0-e33f-4ecd-9ddd-4123842d6c1e commented 12 years ago

    Are __del and tpdel getting conflated in this conversation? I don't see a \_del method on generator objects:

    filfre:$ python3.1 
    Python 3.1.2 (r312:79147, Dec  9 2011, 20:47:34) 
    [GCC 4.4.3] on linux2
    >>> (i for i in range(5)).__del__
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'generator' object has no attribute '__del__'

    but I may be missing something.

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    typeobject.c: TPSLOT("__del__", tp_del, slot_tp_del, NULL, ""),

    I'm not super-familiar with how typeobjects and slots work, but I imagine that if a type is defined with a __del__ member, then the tp_del slot is automatically filled out. The converse need not be true.

    At any rate, the non-zero-ness of tpdel is what gcmodule.c uses to find out if an object has a finaliser. There is a code rudiment present in gcmodule.c, that hints at an earlier time when '\_del__' was looked up but that string is not actually used. That can be removed to, eiter as part of this patch or separately since it should be quite uncontroversial.

    74c4563b-ab1c-43d8-9219-30c4eca796bc commented 12 years ago

    On Fri, Apr 6, 2012 at 4:03 PM, Daniel Stutzbach \stutzbach@google.com\ added the comment:

    > __del__ methods do run, even if an object was collected by the cycle > detector.  And they can't do any harm that couldn't also be done by a C > finalizer.

    No, if an object with a __del__ method is part of a cycle, it is not collected.  The objects get appended to gc.garbage instead.

    See:  http://docs.python.org/py3k/library/gc.html#gc.garbage

    They can still be collected if there is only one object with a __del__ method in the cycle.

    (Whether the code actually does that, it appears not to at the moment, and I won't swear by by own memory of 2.3 era code.)

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 12 years ago

    I'm still unclear about the rationale for this change. krisvale says in the patch and in msg109099 that this is to determine whether an object can be collected "at this time". Is the intended usage that the result value may change over the lifetime of the object?

    If so, I'm -1 on the patch. If an object cannot be collected "at this time", it means that it is added to gc.garbage, which in turn means that it will never be collected (unless somebody explicitly clears gc.garbage).

    Supporting the case of objects that can be collected despite living in a cycle is fine to me, but those objects must not change their mind.

    Supporting the case of objects that are not collectable now, but may be collectable later, may have its use case (which one?), but this is not addressed by the patch (AFAICT). To support it, processing of the entire cycle must be postponed (to the next collection? to the next generation?).

    I'm -0 on recycling the is_gc slot. Having a GC header and having a non-trivial tp_del are two unrelated issues. If this is done, I think it would be best to rename the slot to tp_gc_flags or something. There is also the slight risk of some type in the wild returning non-1 currently, which then would get misinterpreted.

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    Jim: The edge case of collecting an object that is alone in a cycle is something that isn't handled. I'm also not sure that it is worth doing or even safe or possible. but that is beside the point and not the topic of this patch.

    Martin: This patch is formalizing a known fact about cpython objects, albeit an uncommon one: Some objects with finalizers can be safely collected if they are in a certain state. The canonical example is the PyGeneratorObject. gccollect.c has special code for this type and there is a special evil API exposed to deal with the case.

    _This patch is not changing any behaviour._ Generator objects that are in a special state of existance cannot be collected. Rather, they are (and always have been) put in gc.garbage along with the rest of their cycle chain. In other cases, the finalizer causes no side effects and simply clearing them is ok. I couldn't tell you what those generator execution states are, but the fact is that they exist.

    A similar problem exists in Stackless python with tasklets. We solved it in this generic way there rather than add more exceptional code to gcmodule.c and this patch is the result of that work. In Stackless, the inverse is true: An object without an explicit finalizer can still be unsafe to collect, because the tp_dealloc can do evil things, but doesn't always, depending on object state.

    So, objects who change their mind about whether they can be collected or not are a fact of life in python. Yes, even cPython. This patch aims to formalize that fact and give it an interface, rather than to have special code for generator objects in gcmodule.c and an inscrutable API exposed (PyGen_NeedsFinalizing())

    About reusing the slot: Slots are a precious commodity in python. This particular slot is undocumented and used only for one known thing: To distinguish PyTypeObjects from PyHeapTypeObjects. In fact, creating a slot and not using special case code (like they did for PyGeneratorObjects) was forward thinking, and I'm trying to build on that. Renaming the slot is a fine idea.

    A brief search on google code (and google at large) showed no one using this slot. It is one of those undocumented strange slots that one just initializes to 0.

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    Btw. tangentially related to this discussion, bpo-10576 aims to make the situation with uncollectable objects a little more bearable. An application can listen for garbage collection, visit gc.garbage and deal with its problematic types in its own way.

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    I've stumbled upon further cases of this problem, and a possible serious bug in python 3. the _PyIOBasefinalize() will invoke a "close" method on the object if the object isn't closed. This is the same as the object having a \_del__ method. Yet, these objects are not treated as having finalizers by the garbage collector, and indeed, _PyIOBase_finalize() is called by the tp_clear() slot of several derived types. Surely, if these objects define non-trivial 'close' members, they must not be called during garbage collection.

    pitrou commented 12 years ago

    Surely, if these objects define non-trivial 'close' members, they must not be called during garbage collection.

    Define "non-trivial". There are various tests for it in test_io.

    Not ending up in gc.garbage is *by design*. Making file objects uncollectable as soon as they appear in a reference cycle would be a serious regression. That's why the cleanup is done in tpdealloc instead of having a \_del__.

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    I _think_ the only python related things you can do from tp_clear() is Py_DECREF(), this is what I mean by trivial. This is the reason, for example, that special care was done with generators. An IO object could of course do non-python operations such as closing files and freeing buffers.

    If file.close() can be an arbitrary python method, then it can no more be called from gc, than an object's __del__ method. This would not be a regression, this would be a fact of life.

    The point of _this_ defect (bpo-9141) is to allow objects to be smarter about this, being able to tell gc if their finalizers are trivial or not.

    I will start a discussion on python-dev to see if anyone knows exactly why these limitations are in place, and what they are. They are not documented in the source code.

    pitrou commented 12 years ago

    I _think_ the only python related things you can do from tp_clear() is Py_DECREF(), this is what I mean by trivial.

    Well, Py_DECREF is not trivial at all, since it can invoke arbitrary Python code (through e.g. weakref callbacks, or by releasing the GIL). Therefore, I would say any code is allowed from tp_clear :-)

    If file.close() can be an arbitrary python method, then it can no more be called from gc, than an object's __del__ method. This would not be a regression, this would be a fact of life.

    I don't believe it. I don't see what's magical about being called by the gc. Again, a Py_DECREF in tp_dealloc can invoke arbitrary Python code.

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    I don't believe it. I don't see what's magical about being called by the gc. Again, a Py_DECREF in tp_dealloc can invoke arbitrary Python code.

    Look again. gcmodule specifically takes any objects reachable from ob_clear and sees if any of them have side effects when Py_DECREF'd. If any object has a finalizer, the entire cycle is put in gc.garbage.

    gcmodule is trickier than you might think. I've spent quite a time with it.

    Anyway, I've put the issue to python-dev, let's see if they have some autorative insight on the matter.

    pitrou commented 12 years ago

    Look again. gcmodule specifically takes any objects reachable from ob_clear and sees if any of them have side effects when Py_DECREF'd.

    has_finalizer() in gcmodule.c doesn't check for weakref callbacks, so a weakref callback can still be invoked from tp_dealloc.

    cfc9f3e0-e33f-4ecd-9ddd-4123842d6c1e commented 12 years ago

    On Tue, Apr 17, 2012 at 4:08 AM, Antoine Pitrou \report@bugs.python.org\ wrote:

    has_finalizer() in gcmodule.c doesn't check for weakref callbacks, so a weakref callback can still be invoked from tp_dealloc.

    Unless I'm mistaken, weakrefs will be handled and removed by handle_weakrefs() before delete_garbage() is called.

    pitrou commented 12 years ago

    On Tue, Apr 17, 2012 at 4:08 AM, Antoine Pitrou \report@bugs.python.org\ wrote: > has_finalizer() in gcmodule.c doesn't check for weakref callbacks, so a > weakref callback can still be invoked from tp_dealloc.

    Unless I'm mistaken, weakrefs will be handled and removed by handle_weakrefs() before delete_garbage() is called.

    Perhaps, but what does that change?

    serhiy-storchaka commented 6 years ago

    Is this issue still actual after PEP-442?

    pitrou commented 6 years ago

    It's hard to say as I never fully understood the underlying intent.

    Perhaps Kristján or some other developer who better understood than me can chime in.

    84401114-8e59-4056-83cb-632106c0b648 commented 6 years ago

    Hi there! By the time PEP-442 was introduced, I wasn't very active in python core stuff anymore, and still am not.

    The intent of this patch, which is explained (IMHO) quite clearly in the first few comments was to

    Now, with PEP-442, I have no idea how Generators can postpone being garbage collection since I'm honestly not familiar with how things work now.

    I have no particular skin in this game anymore, I'm no longer actively working on Stackless or Python integrations and I stopped trying to push stuff thought the bugtracker to preserve my sanity.

    So, lets just close this until the day in the future when needs arise once more :)