python / cpython

The Python programming language
https://www.python.org
Other
63.45k stars 30.39k forks source link

time_strftime() Buffer Over-read #69105

Closed 8d430a93-5e49-44a5-93eb-a0a4492eab8e closed 9 years ago

8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago
BPO 24917
Nosy @malemburg, @birkenfeld, @pfmoore, @abalkin, @vstinner, @larryhastings, @tjguk, @zware, @eryksun, @zooba
Superseder
  • bpo-25029: MemoryError in test_strptime
  • Files
  • time_strftime_Buffer_Over-read.patch: patch
  • time_strftime_Buffer_Over-read.py: repro
  • time_strftime_Buffer_Over-read_v2.patch: patch
  • time_strftime_Buffer_Over-read_v3.patch
  • 24917_4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/zooba' closed_at = created_at = labels = ['type-security', 'extension-modules', 'deferred-blocker', 'OS-windows'] title = 'time_strftime() Buffer Over-read' updated_at = user = 'https://bugs.python.org/JohnLeitch' ``` bugs.python.org fields: ```python activity = actor = 'steve.dower' assignee = 'steve.dower' closed = True closed_date = closer = 'steve.dower' components = ['Extension Modules', 'Windows'] creation = creator = 'JohnLeitch' dependencies = [] files = ['40228', '40229', '40367', '40368', '40383'] hgrepos = [] issue_num = 24917 keywords = ['patch'] message_count = 67.0 messages = ['248998', '249800', '249801', '249805', '249806', '249857', '249860', '249861', '249862', '249863', '249864', '249867', '249868', '249870', '249872', '249874', '249876', '249877', '249878', '249881', '249883', '249884', '249899', '249904', '249905', '249908', '249919', '249920', '249950', '249956', '249958', '249960', '249961', '249962', '249963', '249964', '249965', '249966', '249967', '249969', '249970', '249971', '249972', '249973', '249974', '249975', '249976', '249977', '249994', '250014', '250020', '250022', '250023', '250024', '250030', '250031', '250035', '250039', '250040', '250041', '250042', '250062', '250064', '250071', '250202', '250203', '250213'] nosy_count = 14.0 nosy_names = ['lemburg', 'georg.brandl', 'paul.moore', 'belopolsky', 'vstinner', 'larry', 'tim.golden', 'BreamoreBoy', 'python-dev', 'zach.ware', 'eryksun', 'steve.dower', 'JohnLeitch', 'brycedarling'] pr_nums = [] priority = 'deferred blocker' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = '25029' type = 'security' url = 'https://bugs.python.org/issue24917' versions = ['Python 3.4', 'Python 3.5'] ```

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    First, let me begin by saying I believe this patch will fix the buffer over-read, which is a good step forward.

    However, after giving the matter more thought, and at the risk of wearing out my welcome, I am of the belief that relying on the CRT to handle malformed format strings is the wrong approach. As per the C spec, strftime's behavior when handling invalid format strings is undefined:

    "If a conversion specifier is not one of the above, the behavior is undefined"

    Quite often, "undefined" translates to "exploitable". And at the very least, by not performing thorough enough validation, Python is misusing strftime(), which may lead to crashes or undermine memory safety. Of course, this is all speculation--I haven't the time or resource to learn other platforms to see what's possible. But, even if I could, the task would be Sisyphean because there's simply no way to know what the future holds when dealing with implementation that could change at any point.

    I realize we must be pragmatic with matters such as this, and a dramatic change could be breaking for some Python apps. Even so, I feel it's worth vocalizing these concerns. As a principal, I think that "safe", well-formed Python should never be able to perform operations that lead to undefined behavior in the underlying runtime.

    Alright, rant done. If at any point in time locking down Python's strftime with more aggressive validation is considered viable, I am more than willing to take a shot at submitting a patch.

    abalkin commented 9 years ago

    Historically, time and os modules have been considered low level, thin wrappers over system libc functions. Users of these modules are the proverbial "consenting adults" who should understand their power and the associated risks.

    The place to provide a better behaved strftime function is the datetime module, but I would leave time.stftime pretty much the way it is and would only consider making *fewer* checks on Windows if _Py_BEGIN_SUPPRESS_IPH / _Py_END_SUPPRESS_IPH macros can indeed help to avoid crashes on illegal format strings.

    zooba commented 9 years ago

    FWIW, those macros prevent a certain class of "undefined behavior," which in this case is to terminate the process rather than return an error code. They don't do anything to prevent crashes due to exploitation or malicious code.

    The place for more robust formatting is datetime.__format__(), which from a quick test seems to be more strict about what is in the format string.

    larryhastings commented 9 years ago

    Having slept on it, I agree with Steve. We should make the minimal change necessary in order to not crash.

    However, it still needs a regression test. The test can use JohnLeitch's proposed test as a good starting point, but it must accept either success or ValueError as passing the test.

    I still think there's merit in detecting a trailing unquoted % at the end of the format string, but that would only be appropriate for 3.6 at this point.

    eryksun commented 9 years ago

    With MSVC, if errno is cleared before calling strftime, then when buflen == 0 you know whether it's an invalid format string (EINVAL) or maxsize is too small (ERANGE). There's no need to guess. Oddly, only EINVAL is documented, even though setting ERANGE has been in the implementation for years.

    VC 10 (strftime.c):

            /* error - return an empty string */
            *(strstart)='\0';
    
            /* now return our error/insufficient buffer indication */
            if ( !failed && left <= 0 )
            {
                /* do not report this as an error to allow the caller to resize */
                errno=ERANGE;
            }
            else
            {
                _VALIDATE_RETURN( FALSE, EINVAL, 0);
            }

    VC 14 (time/wcsftime.cpp):

        // Error:  return an empty string:
        *string = L'\0';
    
        // Now return our error/insufficient buffer indication:
        if (!failed && remaining <= 0)
        {
            // Do not report this as an error to allow the caller to resize:
            errno = ERANGE;
        }
        else
        {
            _VALIDATE_RETURN(false, EINVAL, 0);
        }
    zooba commented 9 years ago

    How's this for the test (I added an explanatory comment, since it may look like it isn't testing anything to someone who isn't familiar with the issue):

        def test_strftime_format_check(self):
            # Test that strftime does not crash on invalid format strings
            # that may trigger a buffer overread. When not triggered,
            # strftime may succeed or raise ValueError depending on
            # the platform.
            for x in [ '', 'A', '%A', '%AA' ]:
                for y in range(0x0, 0x10):
                    for z in [ '%', 'A%', 'AA%', '%A%', 'A%A%', '%#' ]:
                        try:
                            time.strftime(x * y + z)
                        except ValueError:
                            pass
    zooba commented 9 years ago

    New PR: https://bitbucket.org/larry/cpython350/pull-requests/19/issue-24917-time_strftime-buffer-over-read

    larryhastings commented 9 years ago

    Steve, did you confirm that the test triggers the array bounds bug when the patch *isn't* applied? I want to confirm both that a) the test exercises the bug, and b) the fix fixes the bug. I assume you ran the test suite with the patch applied, so that covers b).

    zooba commented 9 years ago

    I wasn't able to repro the crash at all, even with the debugging flags that make this sort of issue more prominent. It relies on a very precise layout of multiple objects in memory, or possibly a specific sequence of allocations/deallocations, as well as a format string ending in an unescaped '%' or (on Windows) '%#'.

    It's still obviously an issue though - we should have the check for '\0' there by any reasonably analysis of the code, or else should not be adding 2 to the pointer to start the next step of the search.

    larryhastings commented 9 years ago

    Okay, I think *I* reproduced it.

    1) I pulled your cpython350 fork down locally.

    2) I updated to your checkin that fixed the bug. (c31dad22c80d)

    3) I reverted the change to Modules/timemodule.c to put the bug back: % hg cat -r 97393 Modules/timemodule.c > Modules/timemodule.c

    4) I changed line 611 (or so) from "#if defined(MS_WINDOWS) && !defined(HAVE_WCSFTIME)" to "#if 1" so I'd get the code that had the bug.

    5) I ran "./configure --with-valgrind && make" to make it.

    6) I ran "valgrind ./python -m test test_time" and ***Valgrind complained about an array overrun***.

    7) I restored the bugfix to Modules/timemodule.c, then reinstated the change from 4) above.

    8) I ran make and valgrind again and didn't get the complaint about the array overrun.

    For grins I also tried enabling the other stanza of code that has the bug (the AIX / sun / have_wcsftime) and observed the same thing.

    Is that convincing?

    zooba commented 9 years ago

    Good enough for me.

    larryhastings commented 9 years ago

    Backout pull request merged, please forward-merge, thanks!

    larryhastings commented 9 years ago

    (I meant, just normal pull request. I did your two pull requests right in a row and got my wires crossed.)

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

    New changeset c31dad22c80d by Steve Dower in branch '3.5': Issue bpo-24917: time_strftime() buffer over-read. https://hg.python.org/cpython/rev/c31dad22c80d

    New changeset f185917498ca by Steve Dower in branch '3.4': Issue bpo-24917: time_strftime() buffer over-read. https://hg.python.org/cpython/rev/f185917498ca

    vstinner commented 9 years ago

    The change c31dad22c80d introduced a regression: issue bpo-25029.

    vstinner commented 9 years ago

    Oh, tracemalloc sees the memory peak of 8 GB too:

    $ ./python -X tracemalloc -i -m test -v test_strptime
    ...
    SystemExit: True
    >>> import tracemalloc; tracemalloc.get_traced_memory()[1] / 1024.**2
    8201.658247947693

    Memory peak: 8201.7 MB (8.2 GB!).

    zooba commented 9 years ago

    I'll fix it on bpo-25029. This thread is already too long for my liking.