python / cpython

The Python programming language
https://www.python.org
Other
63.48k stars 30.4k forks source link

Unupdated source file in traceback #52334

Open 05554c1b-08b9-4b47-8c42-47b86208a78d opened 14 years ago

05554c1b-08b9-4b47-8c42-47b86208a78d commented 14 years ago
BPO 8087
Nosy @loewis, @ncoghlan, @rbtcollins, @ezio-melotti, @merwok, @ericsnowcurrently, @jimjjewett, @reynoldsnlp
Files
  • issue8087.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['interpreter-core', 'type-feature'] title = 'Unupdated source file in traceback' updated_at = user = 'https://bugs.python.org/dmascialino' ``` bugs.python.org fields: ```python activity = actor = 'Robert Reynolds' assignee = 'none' closed = False closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'dmascialino' dependencies = [] files = ['23321'] hgrepos = [] issue_num = 8087 keywords = ['patch'] message_count = 18.0 messages = ['100600', '100608', '100628', '144982', '145108', '145141', '145167', '145168', '145173', '145291', '147142', '153345', '237297', '238174', '238175', '300990', '370206', '370207'] nosy_count = 13.0 nosy_names = ['loewis', 'ncoghlan', 'rbcollins', 'jjconti', 'ezio.melotti', 'eric.araujo', 'ubershmekel', 'chortos', 'dmascialino', 'eric.snow', 'Jim.Jewett', 'Robert Reynolds', 'Aigars Mahinovs'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue8087' versions = [] ```

    05554c1b-08b9-4b47-8c42-47b86208a78d commented 14 years ago

    Example:

    ---------------- mod.py ----------------

    def f():
        a,b,c = 1,2
        print b

    If i do:

    >> import mod >> mod.f()

    I get:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "mod.py", line 2, in f
        a,b,c = 1,2
    ValueError: need more than 2 values to unpack

    If i fix the source:

    ---------------- mod.py ----------------

    def f():
        a,b,c = 1,2,3
        print b

    And do:

    >> mod.f()

    I get:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "mod.py", line 2, in f
        a,b,c = 1,2,3
    ValueError: need more than 2 values to unpack

    The problem is that the source shown is updated, but the executed code is old, because it wasn't reloaded.

    Feature request:

    If the source code shown was modified after import time and it wasn't reloaded, a warning message should be shown.

    Example:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "mod.py", line 2, in f      WARNING: Modified after import!
        a,b,c = 1,2,3
    ValueError: need more than 2 values to unpack

    or something like that. Maybe "use reload()" might appear.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 14 years ago

    Displaying a warning whenever the code has changed on disk is clearly unacceptable - it is both expensive to check (and also, how frequent would you check?), and it would be annoying if you have long-running Python applications so this would just clutter the log files.

    -1 on this request; users should just learn this aspect of Python. Also, it is not possible to get this perfect for many reasons, e.g. when you change a class, it will be impossible to update the code of existing instances even with reload.

    05554c1b-08b9-4b47-8c42-47b86208a78d commented 14 years ago

    Martin, I am not talking about a loop checking source file changes.

    I think the check could be done only when the traceback is printed.

    The function PyErr_Display() calls PyTracBack_Print() and so on until _Py_DisplaySourceLine() is called. The latter reads the source file and prints the line. I don't know which is the best way to know if the source file was updated. But i think that it's possible to check it there.

    I think this could apply only when using the interactive console.

    05554c1b-08b9-4b47-8c42-47b86208a78d commented 13 years ago

    I worked a few hours today and I have this patch. I tried to make a test but could not.

    I know this is not a really good patch, but it's my first one and I wanted to show my idea.

    merwok commented 13 years ago

    I do not understand what the proposed feature is. I also think it should be discussed on the python-ideas mailing list first: http://mail.python.org/listinfo/python-ideas

    ezio-melotti commented 13 years ago

    I'm not sure this is useful to have. If you changed your code you know that you have to reload, so why would you want a warning that tells you that you changed the code?

    For some reason I always had the opposite problem (i.e. after a reload the traceback was still showing the original code, and not the new one), while IIUC you are saying that it shows the new code even if the module is not reloaded. I tried your code and indeed it does what you say, so either I am mistaken and I've been misreading the tracebacks, or this changed from 2.6 to 2.7, or in some cases even the behavior (I think) I observed might happen. I'll have to verify this next time it happens.

    05554c1b-08b9-4b47-8c42-47b86208a78d commented 13 years ago

    On Fri, Oct 7, 2011 at 5:25 PM, Ezio Melotti \report@bugs.python.org\ wrote:

    Ezio Melotti \ezio.melotti@gmail.com\ added the comment:

    I'm not sure this is useful to have.  If you changed your code you know that you have to reload, so why would you want a warning that tells you that you changed the code?

    The source line showed in the traceback could not be the same line executed.

    Take a look to this example:

    k.py: def f(): a,b,c = 1,2

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "k.py", line 2, in f
        a,b,c = 1,2
    ValueError: need more than 2 values to unpack

    k.py: def f(): # blah a,b = 1,2

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "k.py", line 2, in f
        # blah
    ValueError: need more than 2 values to unpack

    For some reason I always had the opposite problem (i.e. after a reload the traceback was still showing the original code, and not the new one), while IIUC you are saying that it shows the new code even if the module is not reloaded. I tried your code and indeed it does what you say, so either I am mistaken and I've been misreading the tracebacks, or this changed from 2.6 to 2.7, or in some cases even the behavior (I think) I observed might happen. I'll have to verify this next time it happens.

    That is strange, I think Python does not save the original code in any place.

    ezio-melotti commented 13 years ago

    The source line showed in the traceback could not be the same line executed.

    My point is that when I see an error and modify the source to fix it, then I know that I'll have to reload. If for some reason I forget to reload, I'll get the wrong line in the traceback and then reload, but I don't think that ever happened to me. So to me, your warning will only be useful in the case where I modified the source, forgot to reload and got the same error again with a wrong line displayed. Also note that reloading is not so common; usually you just restart your application and that will give you the right traceback.

    Also I'm not sure the warning you proposed is the best way to handle this.

    05554c1b-08b9-4b47-8c42-47b86208a78d commented 13 years ago

    So to me, your warning will only be useful in the case where I modified the source, forgot to reload and got the same error again with a wrong line displayed. Also note that reloading is not so common; usually you just restart your application and that will give you the right traceback.

    I know the case when this happens is really unsusual, but the interperter could be able to alert you than that line of the traceback is wrong.

    Also I'm not sure the warning you proposed is the best way to handle this.

    I agree that, another approach is to save a copy of the source file associated to the compiled code (like a .pys).

    ncoghlan commented 13 years ago

    (just reviewing the idea here, not the current patch)

    The problem of "stale code" (i.e. what was executed doesn't match what is displayed in the traceback) is a tricky and subtle one. There are a few different cases:

    1. Source displayed does not match source on disk
      • these cases do happen, but they're almost always due to straight up bugs in the linecache or traceback modules.
    2. Source has been changed, but module has not been reloaded
      • this is the case for "edited source file but forgot to reload module". I've certainly forgotten to do this myself, and I'm far from the only one. This is the case Diego's RFE targets, and I think it has some merit.
    3. Source has been changed, module has been reloaded, but object reference is from previous version of module
      • the patch doesn't detect this. There are various ways we *could* detect it, but they all involve some fairly significant changes to the way compilation and module import work.

    Aspect 3 is a much deeper (and bigger) problem relating to native introspection support in function and class objects. But that doesn't make Diego's idea to improve Aspect 2 invalid - there have certainly been times when playing at the interactive prompt that such a warning would have reminded me that I needed to reload the module I was working on.

    7372a7a0-52d2-4369-87c7-a5d9002bf05d commented 13 years ago

    I just want to note that the code might be edited not only while it is running it in the interactive interpreter but also while it is running as a standalone script. In this case the script naturally would not know to reload its own code nor allow the user to do it manually, and even if it would, some objects created from the old code would probably remain alive.

    Here is a simple shell script that causes Python to show a stack traceback with '# This is just a comment' as the offending line by launching it in the background and then overwriting the Python code. What happened in reality is that I launched a front-end tool for video transcoding I had written in Python and after a while tweaked the code while it was still running. When the actual encode finished, my (buggy) clean-up code raised an exception and I got a traceback saying it had been raised in a line that only had a comment in it.

    cat \<\<EOF >proof.py import time time.sleep(5) raise Exception EOF python proof.py & sleep 1 cat \<\<EOF >proof.py

    # This is just a comment EOF sleep 5

    74c4563b-ab1c-43d8-9219-30c4eca796bc commented 12 years ago

    Martin v. Löwis (loewis) wrote:

    Displaying a warning whenever the code has changed on disk is clearly unacceptable

    As clarified, the request is only for when a traceback is being created (or perhaps even only for when one is being printed).

    I agree that we don't want to watch every file every time any code is run, but by the time a traceback is being displayed, any tight loops are ending.

    Nick Coghlan (ncoghlan) wrote:

    There are a few different cases: ...

    1. Source has been changed, but module has not been reloaded ...
    2. Source has been changed, module has been reloaded, but object ...

    Given that a traceback is being displayed, I think it is reasonable to rerun the find-module portion of import, and verify that there is not stale byte-code.

    Frankly, I think it would be worth storing a file timestamp on modules, and verifying that whatever-would-be-imported-if-imported-now matches that timestamp. This would also catch case (3).

    I also think that -- on traceback display -- it might be worth verifying that the code's __globals is the __globals associated with the module of that name in sys.modules. This would warn about some intentional manipulations, but would catch case (3) even more accurately.

    rbtcollins commented 9 years ago

    Storing a marker in module objects which can be used to validate the linecache is a good idea. timestamp isn't appropriate because of modules loaded from zips or dynamic generation. I'd suggest we make it something opaque - we get source code by asking the loader for it anyway - if we make all source code loading go through the loader, then file based loaders can use a timestamp, zip based ones can shortcircuit or use a reference to the containing zip's file metadata or whatever.

    rbtcollins commented 9 years ago

    Oh, it may be clear to everyone already but its perhaps worth noting: there are two ways the cache can skew.

    (older source): We may have a newer file compiled and in use and the older source in the cache.

    e.g. someone calls linecache.getlines(foo.py), then imports foo, and in between the contents of foo.py changed.

    (newer source) Or we may have an older file compiled and in use, and newer source in the cache.

    e.g. someone imports foo, the contents of foo.py change, and then someone calls linecache.getlines(foo.py).

    One way to address the older source case would be to make importing (also compile()) flush the imported file out of the cache. That doesn't need any special logic or handling - just a new parameter to clearcache() to specify a file (or files?) to evict.

    The newer source case is what needs some logic, and for that, as I said earlier, I think we need something opaque. Perhaps since we have importlib now, if we limit ourselves to considering actual python modules (linecache can technically cache anything with lines in it) we can rely on the import machinery get_source() to handle this: if the source has changed get_source should return None.

    That way we don't need to change the data structure of modules at all - only packages for which there is no 302 loader will be able to suffer cache skew.

    As far as informing callers that this situation has occurred, I think we'll need to think a bit about that, as linecache already signals 'source not available' by returning None, which doesn't leave much room to include an error message. We could start raising exceptions up the stack - which implies modifying PEP-302 (e.g. via a new PEP) and working up the stack.

    rbtcollins commented 9 years ago

    Oh, meant to add - we could just call logging.warning or something.

    d330a3fb-6893-4285-8ce2-52329a23bafc commented 7 years ago

    As a point of update - this is a problem that can make debugging more difficult for people running web services or other server processes that are expected to stay alive for a long time. It is often undesirable to force a server restart or reload every time a new code base is deployed, so you can get into situations where code running does not match the code on disk.

    If you get a crash and a traceback in such situation, you will be very confused.

    As discussed already, this could be solved by taking a hash of the source code at module import time and then comparing that to the code on disk during traceback printing time - if there is a difference, then print a big fat warning that the traceback may be unreliable.

    I just lost two days tracking down weird issues that in the end were caused by server failing to reload code when it should have done so. Having such warning on the first traceback would have alerted me to the problem much earlier.

    5316576c-ea0d-419b-8b47-561129b1d6dc commented 4 years ago

    A pure python demonstration of the problem looks like this (__file__ stores the path to the executed module):

    with open(__file__) as f:
        src = f.read()
    with open(__file__, 'w') as f:
        f.write('\n\n\n\n\n# Whoops! Wrong line!\n')
        f.write(src)
    raise NotImplementedError('The prepended lines have confused you.')
    5316576c-ea0d-419b-8b47-561129b1d6dc commented 4 years ago

    I second what was said by Aigars Mahinovs about long-running processes giving confusing tracebacks that make debugging very difficult. I have a Natural Language Processing pipeline extracting features from a large corpus of texts, and that process can take days to complete. If the underlying modules have since been edited, then when an Exception occurs – including a KeyboardInterrupt – then the traceback shows the wrong lines. The functions listed at the end of the line are correct, which is the only reason I was able to easily detect the source of my confusion; the line number cited was no longer inside of the listed function!

    I propose one more simple thing to track that would be helpful in my situation: how many lines were in the file at call time vs now. It would be (potentially) helpful to have a warning point out that the cited module is now 17 lines longer than it was when it was imported. That way I can make more intelligent guesses about what line was actually the culprit. Obviously there could have been additions and deletions, which muddies the water, but this would at least be a starting point.