python / cpython

The Python programming language
https://www.python.org
Other
61.97k stars 29.8k forks source link

tracebacks eat up memory by holding references to locals and globals when they are not wanted #44031

Closed c7898a41-d409-4bdb-93cd-025165dbc3a3 closed 10 years ago

c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 17 years ago
BPO 1565525
Nosy @tim-one, @loewis, @akuchling, @terryjreedy, @pitrou, @vstinner, @bitdancer, @phmc
Files
  • eat_memory.py
  • clear-tb-frames.txt
  • clear-tb-frames-2.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['easy', 'library', 'performance'] title = 'tracebacks eat up memory by holding references to locals and globals when they are not wanted' updated_at = user = 'https://bugs.python.org/ghazel' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = True closed_date = closer = 'akuchling' components = ['Library (Lib)'] creation = creator = 'ghazel' dependencies = [] files = ['2160', '31668', '31692'] hgrepos = [] issue_num = 1565525 keywords = ['easy'] message_count = 43.0 messages = ['29994', '29995', '29996', '29997', '29998', '29999', '30000', '30001', '30002', '74329', '74401', '79815', '79817', '79818', '79819', '79820', '108646', '108653', '108658', '108665', '108668', '108672', '108677', '108686', '108742', '108748', '108751', '113450', '194513', '194520', '194943', '197267', '197278', '197340', '197347', '197371', '197396', '197397', '197429', '197776', '197777', '197838', '301115'] nosy_count = 10.0 nosy_names = ['tim.peters', 'loewis', 'akuchling', 'terry.reedy', 'pitrou', 'vstinner', 'ghazel', 'r.david.murray', 'python-dev', 'pconnell'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue1565525' versions = ['Python 3.4'] ```

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 17 years ago

    Attached is a file which demonstrates an oddity about traceback objects and the gc.

    The observed behaviour is that when the tuple from sys.exc_info() is stored on an object which is inside the local scope, the object, thus exc_info tuple, are not collected even when both leave scope.

    If you run the test with "s.e = sys.exc_info()" commented out, the observed memory footprint of the process quickly approaches and sits at 5,677,056 bytes. Totally reasonable.

    If you uncomment that line, the memory footprint climbs to 283,316,224 bytes quite rapidly. That's a two order of magnitude difference!

    If you uncomment the "gc.collect()" line, the process still hits 148,910,080 bytes.

    This was observed in production code, where exc_info tuples are saved for re-raising later to get the stack- appending behaviour tracebacks and 'raise' perform. The example includes a large array to simulate application state. I assume this is bad behaviour occurring because the traceback object holds frames, and those frames hold a reference to the local objects, thus the exc_info tuple itself, thus causing a circular reference which involves the entire stack. Either the gc needs to be improved to prevent this from growing so wildly, or the traceback objects need to (optionally?) hold frames which do not have references or have weak references instead.

    tim-one commented 17 years ago

    Logged In: YES user_id=31435

    Your memory bloat is mostly due to the

    d = range(100000)

    line. Python has no problem collecting the cyclic trash, but you're creating 100000 * 100 = 10 million integer objects hanging off trash cycles before invoking gc.collect(), and those integers require at least 10 million

    Note that memory usage in your program remains low and steady if you force gc.collect() after every call to bar(). Then you only create 100K integers, instead of 10M, before the trash gets cleaned up.

    There is no simple-minded way to "repair" this, BTW. For example, /of course/ a frame has to reference all its locals, and moving to weak references for those instead would be insanely inefficient (among other, and deeper, problems).

    Note that the library reference manual warns against storing the result of exc_info() in a local variable (which you're /effectively/ doing, since the formal parameter s is a local variable within foo()), and suggests other approaches. Sorry, but I really couldn't tell from your description why you want to store this stuff in an instance attribute, so can't guess whether another more-or-less obvious approach would help.

    For example, no cyclic trash is created if you add this method to your class O:

        def get_traceback(self):
            self.e = sys.exc_info()

    and inside foo() invoke:

        s.get_traceback()

    instead of doing:

        s.e = sys.exc_info()

    Is that unreasonable? Perhaps simpler is to define a function like:

    def get_exc_info():
        return sys.exc_info()

    and inside foo() do:

        s.e = get_exc_info()

    No cyclic trash gets created that way either. These are the kinds of things the manual has suggested doing for the last 10 years ;-)

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 17 years ago

    Logged In: YES user_id=731668

    I have read the exc_info suggestions before, but they have never made any difference. Neither change you suggest modifies the memory footprint behaviour in any way.

    Weakrefs might be slow, I offered them as an alternative to just removing the references entirely. I understand this might cause problems with existing code, but the current situation causes a problem which is more difficult to work around. Code that needs locals and globals can explicity store a reference to eat - it is impossible to dig in to the traceback object and remove those references. The use-case of storing the exc_info is fairly simple, for example: Two threads. One queues a task for the other to complete. That task fails an raises an exception. The exc_info is caught, passed back to the first thread, the exc_info is raised from there. The goal is to get the whole execution stack, which it does quite nicely, except that it has this terrible memory side effect.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 17 years ago

    Logged In: YES user_id=21627

    I'm still having problems figuring out what the bug is that you are reporting. Ok, in this case, it consumes a lot of memory. Why is that a bug?

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 17 years ago

    Logged In: YES user_id=731668

    The bug is the circular reference which is non-obvious and unavoidable, and cleaned up at some uncontrolable (unless you run a full collection) time in the future. There are many better situations or solutions to this bug, depending on which you think it is. I think those should be investigated.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 17 years ago

    Logged In: YES user_id=21627

    I disagree that the circular reference is non-obvious. I'm not sure what your application is, but I would expect that callers of sys.exc_info should be fully aware what a traceback is, and how it refers to the current frames. I do agree that it is unavoidable; I fail to see that it is a bug because of that (something unavoidable cannot be a bug).

    If you are saying that it is unavoidable in your application: I have difficulties believing this. For example, you could do

            s.e = sys.exc_info()[:2]

    This would drop the traceback, and thus not create a cyclic reference. Since, in the program you present, the traceback is never used, this looks like a "legal" simplification (of course, you don't use s.e at all in this program, so I can only guess that you don't need the traceback in your real application).

    As for the time of cleanup not being controllable: you can certainly control frequency of gc with gc.set_threshold; no need to invoke gc explicitly.

    tim_one: Why do you think your proposed modification of introducing get_traceback would help? The frame foo still refers to s (which is an O), and s.e will still refer to the traceback that includes foo.

    tim-one commented 17 years ago

    Logged In: YES user_id=31435

    [Martin]

    tim_one: Why do you think your proposed modification of introducing get_traceback would help? The frame foo still refers to s (which is an O), and s.e will still refer to the traceback that includes foo.

    Sorry about that! It was an illusion, of course. I wanted to suggest a quick fix, and "tested it" too hastily in a program that didn't actually bloat with /or/ without it.

    For the OP, I had need last year of capturing a traceback and (possibly) displaying it later in ZODB. It never would have occurred to me to try saving away exc_info(), though. Instead I used the traceback module to capture the traceback output (a string), which was (possibly) displayed later, with annotations, by a different thread. No cycles, no problems.

    BTW, I must repeat that there is no simple-minded way to 'repair' this. That isn't based on general principle, but on knowledge of how Python is implemented.

    akuchling commented 17 years ago

    Logged In: YES user_id=11375

    A quick grep of the stdlib turns up various uses of sys.exc_info that do put it in a local variable., e.g. doctest._exception_traceback, unittest._exc_info_to_string, SimpleXMLRPCServer._marshaled_dispatch. Do these all need to be fixed?

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 17 years ago

    If you are saying that it is unavoidable in your application: I have difficulties believing this. For example, you could do

        s.e = sys.exc_info()[:2]

    This would drop the traceback, and thus not create a cyclic reference. Since, in the program you present, the traceback is never used, this looks like a "legal" simplification (of course, you don't use s.e at all in this program, so I can only guess that you don't need the traceback in your real application).

    "This was observed in production code, where exc_info tuples are saved for re-raising later to get the stack- appending behaviour tracebacks and 'raise' perform."

    I would like the traceback object so that I can re-raise the error. I can stringify it as tim_one suggests, but that can't be used with 'raise' and 'try','except' later.

    It is not important for my application to have all the references that the traceback object contains, which is what is causing the massive memory requirement. If I could replace the exc_info()[2] with a traceback look-alike that only held file, line, etc information for printing a standard traceback, that would solve this problem.

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 15 years ago

    Or, being able to remove the references to the locals and globals from the traceback would be sufficient.

    Something like this:

    try: raise Exception() except: t, v, tb = sys.exc_info() tbi = tb while tbi: tbi.tb_frame.f_locals = None tbi.tb_frame.f_globals = None tbi = tbi.tb_next # now "tb" is cleaned of references

    vstinner commented 15 years ago

    Similar issue: bpo-4034 proposes to be able to set tb.tb_frame=None. It's easy to implement this, I can write a patch for this.

    vstinner commented 15 years ago

    I wrote a patch to support \<traceback object>.tb_frame=None. It works but the traceback becomes useless because you have unable to display the traceback. The frame is used by tb_printinternal() to get the filename (co_filename) and the code name (co_name).

    I also tried: while tbi: frame = tbi.tb_frame tbi = tbi.tb_next frame.f_locals.clear() frame.f_globals.clear()

    ... and it doesn't work because the tbi variable is also removed!

    A traceback object have to contain the full frame, but the frame contains "big" objects eating your memory. A solution to your initial problem (store exceptions) is to not store the traceback object or at least to store it as a (list of) string. Solution already proposed by loewis (msg29999).

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 15 years ago

    But a list of strings is not re-raisable. The co_filename, co_name, and such used to print a traceback are not dependent on the locals or globals.

    vstinner commented 15 years ago

    I tried to remove the frame from the traceback type (to keep only the filename and code name), but many functions rely on the frame object. Some examples:

    Lib/unittest.py: class TestResult(object): def _is_relevant_tblevel(self, tb): return '\_unittest' in tb.tb_frame.f_globals

    Lib/traceback.py: print_tb() uses tb.tb_frame.f_globals for linecache.getline()

    Doc/tools/jinga/debugger.py: translateexception() checks if \_jinja_template__ variable is present in b.tb_frame.f_globals

    Lib/idlelib/StackViewer.py: StackTreeItem.get_stack() stores each tb.tb_frame in a list FrameTreeItem.GetText() reads frame.fglobals["\_name__"] and gets the filename and code name using frame.f_code

    Lib/logging/init.py: currentframe() reads sys.exc_traceback.tb_frame.f_back

    Lib/types.py: Use tb.tb_frame to create the FrameType

    (...)

    co_name/co_filename can be stored directly in the traceback. But what about tb.tb_frame.f_back and tb.tb_frame.f_globals? I'm not motivated enough to change traceback API.

    vstinner commented 15 years ago

    Greg Hazel> But a list of strings is not re-raisable

    Do you need the original traceback? Why not only raising the exception? Eg. ----

    import sys
    try:
        raise Exception("hm!")
    except:
        t, v, tb = sys.exc_info()
        raise v

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 15 years ago

    STINNER Victor> Do you need the original traceback? Why not only raising the exception?

    If the exception was captured in one stack, and is being re-raised in another. It would be more useful to see the two stacks appended instead of just the place where it was re-raised (or the place where it was raised initially, which is what a string would get you - not to mention the inability to catch it).

    terryjreedy commented 14 years ago

    Given comments like "I'm still having problems figuring out what the bug is that you are reporting. (Martin)" and "I must repeat that there is no simple-minded way to 'repair' this. (Tim)", and subsequent changes to Python, should this still be open? If so, for which version(s)?

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 14 years ago

    This is still an issue.

    The bug I'm reporting had been explained well, I thought, but I'll repeat it in summary: There is no way to pass around traceback objects without holding references to an excessive number of objects.

    Traceback raising typically does not use these references at all, so having some way to discard them would be very valuable. This allows storing and passing tracebacks between threads (or coroutines or async tasks) without dying quickly due to memory bloat. The simple-minded way to fix this is to allow the user to break the reference themselves.

    Fixing this bug would invalidate the need for hacks like the one Twisted has come up with in their twisted.python.Failure object which stringifies the traceback object, making it impossible to re-raise the exception. Failure has a lot of re-implementations of Exceptions and traceback objects as a result.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 14 years ago

    I still don't understand the issue. You say that you want a traceback, but then you say you don't want the objects in the traceback. So what *precisely* is it that you want, and what is it that you don't want?

    In any case, whatever the solution, it is likely a new feature, which aren't acceptable anymore for 2.x release. So please don't target this report for any 2.x version.

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 14 years ago

    The objects I do want in the traceback are the objects necessary to print a traceback, but not the locals and globals of each frame.

    For example:

    def bar():
      x = "stuff"
      raise Exception("example!")
    bar()
    prints: 
    Traceback (most recent call last):
      Line 4, in <module>
        bar()
      Line 3, in bar
        raise Exception("example!")
    Exception: example!

    There is no reason in that example to have a reference to "x" in the traceback, since it's not used in the output. This becomes important when I try to save a reference to the traceback object and raise it later:

    import sys
    def bar():
      x = "stuff"
      raise Exception("example!")
    try:
      bar()
    except:
      exc_info = sys.exc_info()
    def foo(e):
      raise e[0], e[1], e[2]
    # sometime possibly much later...
    foo(exc_info)
    Traceback (most recent call last):
      Line 12, in <module>
        foo(exc_info)
      Line 6, in <module>
        bar()
      Line 4, in bar
        raise Exception("example!")
    Exception: example!

    During that "sometime possibly much later..." comment, a reference to "x" is held, when it will not be used in printing the traceback later. So, I would not like to keep a reference to "x", and currently there is no way to do that without also dropping a reference to the data needed to print the traceback.

    terryjreedy commented 14 years ago

    To call this a bug for tracker purposes, there would have to be a specific discrepancy between doc promise and observed behavior. Every feature request fixes a 'design bug' ;-).

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 14 years ago

    It seems this is partially addressed in a big red "Warning" section of the docs on sys.exc_info: http://docs.python.org/release/3.1/library/sys.html#sys.exc_info and is captured in the "Open Issue: Garbage Collection" section for PEP-3134: http://www.python.org/dev/peps/pep-3134/

    Bug or feature request, this a well understood and real issue.

    bitdancer commented 14 years ago

    Those two references have nothing to do with your request. They are talking about cycles that get created by grabbing the traceback variable, and the fact that it may take a while before garbage collection reclaims them *after you have let go of the traceback reference*. You are talking about wanting to release the locals in the stack frame and still hold on to the traceback. Completely different issue.

    Note that while the standard traceback print doesn't reference the locals in the frame, extended tracebacks such as cgitb do. There is also a proposal to add such locals printing to the standard traceback.

    I'm pretty sure this issue isn't going to go anywhere without a patch proposal, but before you work on that you might want to raise the idea on python-ideas and see if a wider audience thinks it is a good idea.

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 14 years ago

    It depends on how you look at it. Those two issues describe the surprising behavior of the same references I'm talking about here, when the lifetime of the traceback reference is only inside the same frame. This ticket describes the surprising behavior of those references when the lifetime of the traceback is any number of frames. My example eat_memory.py is much closer to the issue described in those links - the lifetime of the traceback object is insignificantly one frame higher, not the lifetime of the application.

    Either way, a feature to discard those references would resolve both.

    bitdancer commented 14 years ago

    It is true that your proposed feature would provide a way for a programmer to manually resolve the cycle issue; however, the open issue in the pep is how to do this automatically.

    But if you hold on to the traceback reference, you should *expect* all those values to persist, so that shouldn't be "surprising".

    I repeat my recommendation that you take this to python-ideas for feedback, and then work on a patch if the feedback is positive.

    (By the way, I checked with a twisted developer, and what he wanted was a convenient way to manually create traceback objects.)

    c7898a41-d409-4bdb-93cd-025165dbc3a3 commented 14 years ago

    you should *expect* all those values to persist, so that shouldn't be "surprising".

    It's not something I expected, and the warnings around traceback objects are a good indication that other developers have not expected it either. One poster on python-ideas said "Working with traceback objects can easily introduce hidden circular references, so it usually better not access them at all". Since these 'hidden' references are not used in many cases, it is surprising that they would be required.

    I repeat my recommendation that you take this to python-ideas for feedback, and then work on a patch if the feedback is positive.

    I have, and it has been so far.

    (By the way, I checked with a twisted developer, and what he wanted was a convenient way to manually create traceback objects.)

    When does Twisted want to manually create traceback objects? Failure has specific functions to stringify the traceback to remove the references mentioned here. Creating a fake traceback would be one way to achieve that, but if the references did not exist I'm not sure what the goal would be.

    bitdancer commented 14 years ago

    Excellent.

    As for twisted, I'm just repeating what I understood of what he said when I asked. It could well be that this feature would help them, I don't know.

    terryjreedy commented 14 years ago

    This seems to be about reducing internal resource usage in a way that would be mostly invisible to the normal user. A core surface feature request would have to be put off to 3.3, but I do not see that as such.

    pitrou commented 11 years ago

    frame.clear() was committed in bpo-17934.

    vstinner commented 11 years ago

    frame.clear() was committed in bpo-17934.

    How should it be used to workaround this issue ("tracebacks eat up memory by holding references to locals and globals when they are not wanted")?

    We need maybe an helper to clear all frames referenced by a traceback?

    pitrou commented 11 years ago

    We need maybe an helper to clear all frames referenced by a traceback?

    Yes. Something in the traceback module would be fine.

    akuchling commented 10 years ago

    Here's a patch implementing traceback.clear_tb_frames(). (Feel free to bikeshed about the name.)

    One more substantial question: the top frame of the traceback is possibly still running. Currently the code skips it by doing an initial 'tb = tb.tb_next'. Would it be better to catch and ignore the RuntimeError from frame.clear()?

    pitrou commented 10 years ago

    One more substantial question: the top frame of the traceback is possibly still running. Currently the code skips it by doing an initial 'tb = tb.tb_next'. Would it be better to catch and ignore the RuntimeError from frame.clear()?

    Yes, I think it would be better. Other than that, the doc lacks a "versionadded" tag. Thanks!

    akuchling commented 10 years ago

    Revised version of the patch: catches RuntimeError instead of skipping the first frame; adds versionadded tag; adds entry to NEWS and whatsnew files.

    vstinner commented 10 years ago

    I would prefer a clear_frames() method on the traceback object rather than a function.

    pitrou commented 10 years ago

    Revised version of the patch: catches RuntimeError instead of skipping the first frame; adds versionadded tag; adds entry to NEWS and whatsnew files.

    Looks good to me, thank you.

    vstinner commented 10 years ago

    I tried to implement the feature as a new traceback.clear_frames() method. I tried to follow the chain of frame objects (using frame.f_back), but it does not work as expected. The method needs to follow the chain of traceback objects (tb.tb_next). So it makes sense to define a function instead of a method (a method usually only affect the object, not a chain of objects).

    clear-tb-frames-2.txt:

    "F.clear(): clear most references held by the frame");

    So I suggest a more permissive documentation:

    "Clear most reference held by frames."

    pitrou commented 10 years ago
    • The documentation is wrong: frame.clear() does not guarantee to clear *all locals, but only *most locals:

    "F.clear(): clear most references held by the frame");

    Actually, the documentation is right: "references" != "locals" ;-)

    akuchling commented 10 years ago

    I'm happy to change the function name, though I'll note that the traceback module does have print_tb(), format_tb() and extract_tb().

    I'm OK with both of Victor's suggestions but personally slightly prefer traceback.clear_frames(tb).

    Rationale: People who are keeping tracebacks around and want to save memory are probably using other functions from the traceback module, and the module has fairly short names (print_tb, format_exception) so I doubt they'll often do 'from traceback import clear_traceback_frames'.

    akuchling commented 10 years ago

    Ping! I'd like to change the function name to clear_frames() and then commit this. Antoine or anyone, want to disagree with using clear_frames() as the name?

    pitrou commented 10 years ago

    Ping! I'd like to change the function name to clear_frames() and then commit this. Antoine or anyone, want to disagree with using clear_frames() as the name?

    clear_frames() sounds fine to me :-)

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

    New changeset 100606ef02cf by Andrew Kuchling in branch 'default': bpo-1565525: Add traceback.clear_frames() helper function to clear locals ref'd by a traceback http://hg.python.org/cpython/rev/100606ef02cf

    vstinner commented 6 years ago

    traceback.clear_frames() doesn't really clear all frames: see bpo-31321.