python / cpython

The Python programming language
https://www.python.org
Other
63.41k stars 30.36k forks source link

time_strftime() Buffer Over-read #69105

Closed 8d430a93-5e49-44a5-93eb-a0a4492eab8e closed 9 years ago

8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago
BPO 24917
Nosy @malemburg, @birkenfeld, @pfmoore, @abalkin, @vstinner, @larryhastings, @tjguk, @zware, @eryksun, @zooba
Superseder
  • bpo-25029: MemoryError in test_strptime
  • Files
  • time_strftime_Buffer_Over-read.patch: patch
  • time_strftime_Buffer_Over-read.py: repro
  • time_strftime_Buffer_Over-read_v2.patch: patch
  • time_strftime_Buffer_Over-read_v3.patch
  • 24917_4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/zooba' closed_at = created_at = labels = ['type-security', 'extension-modules', 'deferred-blocker', 'OS-windows'] title = 'time_strftime() Buffer Over-read' updated_at = user = 'https://bugs.python.org/JohnLeitch' ``` bugs.python.org fields: ```python activity = actor = 'steve.dower' assignee = 'steve.dower' closed = True closed_date = closer = 'steve.dower' components = ['Extension Modules', 'Windows'] creation = creator = 'JohnLeitch' dependencies = [] files = ['40228', '40229', '40367', '40368', '40383'] hgrepos = [] issue_num = 24917 keywords = ['patch'] message_count = 67.0 messages = ['248998', '249800', '249801', '249805', '249806', '249857', '249860', '249861', '249862', '249863', '249864', '249867', '249868', '249870', '249872', '249874', '249876', '249877', '249878', '249881', '249883', '249884', '249899', '249904', '249905', '249908', '249919', '249920', '249950', '249956', '249958', '249960', '249961', '249962', '249963', '249964', '249965', '249966', '249967', '249969', '249970', '249971', '249972', '249973', '249974', '249975', '249976', '249977', '249994', '250014', '250020', '250022', '250023', '250024', '250030', '250031', '250035', '250039', '250040', '250041', '250042', '250062', '250064', '250071', '250202', '250203', '250213'] nosy_count = 14.0 nosy_names = ['lemburg', 'georg.brandl', 'paul.moore', 'belopolsky', 'vstinner', 'larry', 'tim.golden', 'BreamoreBoy', 'python-dev', 'zach.ware', 'eryksun', 'steve.dower', 'JohnLeitch', 'brycedarling'] pr_nums = [] priority = 'deferred blocker' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = '25029' type = 'security' url = 'https://bugs.python.org/issue24917' versions = ['Python 3.4', 'Python 3.5'] ```

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    Python 3.5 suffers from a vulnerability caused by the behavior of the time_strftime() function. When called, the function loops over the format string provided, using strchr to search for each instance of '%'. After finding a '%', it continues to search two characters ahead, assuming that each instance is the beginning of a well formed format string token.

    However, if a string ends with '%', this logic will result in a call to strchr that reads off the end of the format string buffer:

        /* check that the format string contains only valid directives */
        for(outbuf = strchr(fmt, '%'); <<<< Assuming fmt ends with a '%', this will return a pointer to the end of the string.
            outbuf != NULL;
            outbuf = strchr(outbuf+2, '%')) <<<< Once outbuf is pointing to the end of the string, outbuf+2 skips
        {                                        past the null terimnator, leading to a buffer over-read.
            if (outbuf[1]=='#')
                ++outbuf; /* not documented by python, */
            if ((outbuf[1] == 'y') && buf.tm_year < 0)
            {
                PyErr_SetString(PyExc_ValueError,
                            "format %y requires year >= 1900 on Windows");
                Py_DECREF(format);
                return NULL;
            }
        }

    In some applications, it may be possible to exploit this behavior to disclose the contents of adjacent memory. The buffer over-read can be observed by running the following script:

    from time import *
    strftime("AA%"*0x10000)

    Which, depending on the arrangement of memory, may produce an exception such as this:

    0:000> g (20b8.18d4): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=ffffffff ebx=52c1a6a0 ecx=00000000 edx=08ef3000 esi=08ec2fe8 edi=08ec2ff8 eip=52d254f3 esp=004cf9d4 ebp=004cfa58 iopl=0 nv up ei pl nz na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206 python35!strchr+0x33: 52d254f3 f30f6f0a movdqu xmm1,xmmword ptr [edx] ds:002b:08ef3000=???????????????????????????????? 0:000> db edx-0x10 08ef2ff0 41 25 41 41 25 41 41 25-00 d0 d0 d0 d0 d0 d0 d0 A%AA%AA%........ 08ef3000 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3010 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3020 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3030 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3040 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3050 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3060 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 0:000> k5 ChildEBP RetAddr
    004cf9d0 52c1a7f6 python35!strchr+0x33 [f:\dd\vctools\crt\vcruntime\src\string\i386\strchr_sse.inc @ 75] 004cfa58 52c832d3 python35!time_strftime+0x156 [c:\build\cpython\modules\timemodule.c @ 615] 004cfa74 52ce442f python35!PyCFunction_Call+0x113 [c:\build\cpython\objects\methodobject.c @ 109] 004cfaa8 52ce18ec python35!call_function+0x2ff [c:\build\cpython\python\ceval.c @ 4651] 004cfb20 52ce339f python35!PyEval_EvalFrameEx+0x232c [c:\build\cpython\python\ceval.c @ 3184]

    To fix this issue, it is recommended that time_strftime() be updated to check outputbuf[1] for null in the body of the format string directive validation loop. A proposed patch is attached.

    Credit: John Leitch (johnleitch@outlook.com), Bryce Darling (darlingbryce@gmail.com)

    abalkin commented 9 years ago

    Is there a CERT report associated with this vulnerability?

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    Currently, no. Would you like us to report this and future vulnerabilities to CERT?

    abalkin commented 9 years ago

    This is happening in windows-only code, so not being a windows user I cannot reproduce it. It does not look like something new. This code was last touched in bpo-10653.

    Can someone confirm that only 3.5 is affected?

    abalkin commented 9 years ago

    Would you like us to report this and future vulnerabilities to CERT?

    If it affects released versions, I think this is the right thing to do. Note that I don't know what PSF policy is on this is or whether they even have one.

    83d2e70e-e599-4a04-b820-3814bbdb9bef commented 9 years ago

    I have tried the reproducer on Windows 10 with 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6. In every case I got this.

    C:\py -3.4 -c"from time import *;strftime('AA%'*0x10000)"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ValueError: Invalid format string
    abalkin commented 9 years ago

    In every case I got [ValueError]

    You shouldn't have. I am no Windows expert, but I suspect something is wrong with your use of the command line. Please try it at the Python prompt or put the code in a script.

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    I have tried the reproducer on Windows 10 with 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6. In every case I got this.

    What you are observing is due to the arrangement and contents of process memory. With a simple repro (such as the one provided), there's a good chance the null terminator of the format string will be followed by more null bytes, and thus the code will appear to work as intended. In more complex scripts where memory is ultimately reused, it's more likely that the null terminator will be followed by garbage, non-null bytes.

    To make the issue reproduce more reliably, use GFlags to enable heap tail checking, heap free checking, and page heap.

    https://msdn.microsoft.com/en-us/library/windows/hardware/ff549557(v=vs.85).aspx

    Then, when you repro the issue, you'll see the crash because the uninitialized memory will contain the fill pattern 0xd0 rather than 0x00, like this:

    0:000> db edx-0x10 08ef2ff0 41 25 41 41 25 41 41 25-00 d0 d0 d0 d0 d0 d0 d0 A%AA%AA%........ 08ef3000 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3010 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3020 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3030 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3040 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3050 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? 08ef3060 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????

    To be clear, heap verification is not a requirement--the bug can indeed be reproduced without it. However, it will make life easier by introducing more determinism.

    83d2e70e-e599-4a04-b820-3814bbdb9bef commented 9 years ago

    Python prompt or script makes no difference.

    C:\Users\Mark\Desktop>py -2.6 time_strftime_Buffer_Over-read.py
    Traceback (most recent call last):
      File "time_strftime_Buffer_Over-read.py", line 2, in <module>
        strftime("AA%"*0x10000)
    ValueError: Invalid format string
    Python 3.5.0rc2 (v3.5.0rc2:cc15d736d860, Aug 25 2015, 05:08:37) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from time import *
    >>> strftime("AA%"*0x10000)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Invalid format string
    abalkin commented 9 years ago

    Sorry for a bad guess. John's advise makes much more sense than mine.

    83d2e70e-e599-4a04-b820-3814bbdb9bef commented 9 years ago

    Okay, so presumably this applies to all supported versions and not just 3.5? Can you also remove the reproducer and supply specific instructions on how to build the code so that the problem can be reproduced.

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    It very well may apply to versions apart from 3.5. Our test environment is quite complex and unfriendly to working with multiple versions of Python. Plus, we're strapped for time, so we tend to file under the version we're currently targeting and defer to those better equipped and more familiar with the codebase.

    The repro file is fine and there's no need to rebuild anything; the heap verification I'm suggesting is applied to binaries, not source. You can find information here:

    https://msdn.microsoft.com/en-us/library/windows/hardware/ff549557(v=vs.85).aspx

    Install the debugging tools, run GFlags, select the python.exe/pythonw.exe image file, and apply the settings I mentioned in my previous post.

    abalkin commented 9 years ago

    John,

    There is no doubt that this is a bona fide bug. I was just hoping that someone with a Windows development machine would help figuring out the affected versions.

    From the hg history, it looks like the faulty code is present in all versions starting from 3.2:

    https://hg.python.org/cpython/annotate/977c5753ca32/Modules/timemodule.c#l515

    I will set the issue versions accordingly.

    abalkin commented 9 years ago

    I will keep the "security" classification, but will not increase the priority. Accepting format strings from untrusted sources is a vulnerability in and by itself and in most cases those strings are literals.

    abalkin commented 9 years ago

    FWIW, the patch looks good to me, but it needs to be reviewed by a Windows developer.

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    When I get a bit of slackspace (probably tomorrow afternoon/evening) I can test on the spectrum of versions to confirm the issue is in >= 3.2. I'll also look into improving our automation so all future reports can have the appropriate versions flagged.

    Regarding untrusted format strings, I believe you are mistaken. In native applications, untrusted format strings are problematic because an attacker can use injected tokens to read/write arbitrary memory, which can be leveraged to attain code execution.

    However, in the context of Python, a format string with too many tokens should be handled safely, resulting in a Python exception rather than exploitable memory corruption. This is the behavior observed in format string handling throughout Python (and indeed most managed/scripting languages). Yes, in most Python programs format strings will be constants, and using dynamically constructed format strings may be considered a bad practice. But, should a developer choose to pass a dynamically constructed string (for example, functionality that allows untrusted users to specify custom time formatting), it's not unreasonable for them to expect memory safety to be maintained.

    Of course, if there's a risk I'm overlooking I'd like to better understand it, and the relevant Python documentation should be updated.

    abalkin commented 9 years ago

    As far as I know, the practical consequence of "security" classification for the issue is how many affected older versions will be patched. I am keeping that and the 3.2 - 3.5 versions range.

    The priority may affect whether this will make it into 3.5.0. I'll defer this decision to the release managers.

    If you want an authoritative answer on the impact of this issue - report it to CERT or FIRST and get them compute the CVSS score.

    larryhastings commented 9 years ago

    Yes, I'd like this fix in 3.5.0.

    One bit of feedback on the patch: outbuf is a char (or wchar_t *), therefore outbuf[1] is a char (or wchar_t). You shouldn't compare it to NULL. I'm not sure we still support any compilers that define NULL as (void \)0, but we might. Anyway, please instead compare to '\0'.

    abalkin commented 9 years ago

    if there's a risk I'm overlooking I'd like to better understand it, and the relevant Python documentation should be updated.

    I don't think there is any special risk that you are overlooking other than a documented fact that Python's strftime is a thin layer on top of system strftime and these are notoriously buggy on many systems.

    A python application that accepts custom formats from users should limit those formats to a set that is known to work on the targeted platforms. Relying on strftime to properly return an error code and not do anything nasty is probably not a good idea.

    This said, I express no opinion on the severity of this bug.

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    Attached is a revised patch.

    abalkin commented 9 years ago

    Hmm, on Mac OSX "%" and "A%" are valid format strings:

    >>> time.strftime("%")
    '%'
    >>> time.strftime("A%")
    'A%'

    Mark's experiments show that on Windows they are not. What about the other platforms affected by the patch? I am concerned about the bottom part of the patch.

    A nitpick: an existing error message is "Invalid format string". I would either reuse it or make it "Incomplete format string".

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    I plucked the error message from the % operator:

    >>> '%' % 'foo'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: incomplete format
    >>> '%z' % 'foo'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: unsupported format character 'z' (0x7a) at index 1

    That said, it should be more consistent. I went with the latter suggestion since it's more informative. An updated patch is attached.

    As for platform compatibility, I'd certainly feel better if someone could test the Sun/AIX changes. Unfortunately, I'm unable to do so. It's interesting that OSX accepts "%". As per the spec, "%%" is a "%" literal, and the behavior of invalid tokens such as "%" is undefined.

    malemburg commented 9 years ago
    On 05.09.2015 03:49, Alexander Belopolsky wrote:
    > 
    > Alexander Belopolsky added the comment:
    > 
    > Hmm, on Mac OSX "%" and "A%" are valid format strings:
    > 
    >>>> time.strftime("%")
    > '%'
    >>>> time.strftime("A%")
    > 'A%'
    > 
    > Mark's experiments show that on Windows they are not. What about the other platforms affected by the patch?  I am concerned about the bottom part of the patch.

    Trailing '%' are valid on Linux, just as using unsupported format codes (those as passed through as is).

    On Windows, the C lib strftime() segfaults with a trailing '%', just as it does with unsupported format codes.

    I have this code in mxDateTime to check for this:

    static
    int _mxDateTime_CheckWindowsStrftime(char *fmt,
                         struct tm *tm)
    {
        char *p;
    /* Range checks \*/
    Py_Assert(tm-\>tm_sec \< 60,
          PyExc_ValueError,
          ".strftime() cannot format leap seconds on Windows");
    Py_Assert(tm-\>tm_mday \<= 31,
          PyExc_ValueError,
          ".strftime() cannot format days \> 31 on Windows");
        /* Scan format string for invalid codes */
        for (p = fmt; *p != '\0'; p++) {
        register char code;
        if (*p != '%')
            continue;
        code = *++p;
        /* Check for supported format codes; see
           https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx */
        switch (code) {
        case 'a':
        case 'A':
        case 'b':
        case 'B':
        case 'c':
        case 'd':
        case 'H':
        case 'I':
        case 'i':
        case 'm':
        case 'M':
        case 'p':
        case 'S':
        case 'U':
        case 'w':
        case 'W':
        case 'x':
        case 'X':
        case 'y':
        case 'Y':
        case 'z':
        case 'Z':
        case '%':
            continue;
        case '\0':
            Py_Error(PyExc_ValueError,
                 "format string may not end with a '%' character "
                 "on Windows");
        default:
            Py_ErrorWithArg(PyExc_ValueError,
                    "format code '%c' not supported on Windows",
                    code);
        }
        }
        return 0;
    
      onError:
        return -1;
    }
    eryksun commented 9 years ago

    On Windows, the C lib strftime() segfaults with a trailing '%', just as it does with unsupported format codes.

    No, it doesn't segfault; it invokes the invalid parameter handler (IPH). This could always be configured for the whole process, but with VC 14 it can also be configured per thread. I think it was Steve Dower who updated time_strftime to take advantage of this feature in 3.5, using the new macros _Py_BEGIN_SUPPRESS_IPH and _Py_END_SUPPRESS_IPH. This allowed removing the following check that used to be in the code prior to 3.5:

            if (outbuf[1]=='\0' ||
                !strchr("aAbBcdHIjmMpSUwWxXyYzZ%", outbuf[1]))
            {
                PyErr_SetString(PyExc_ValueError, "Invalid format string");
                Py_DECREF(format);
                return NULL;
            }

    With this change the time module no longer has to make assumptions about what the CRT considers to be a valid format string in order to avoid invoking the IPH. However, this also accidentally removed checking whether outbuf[1]=='\0'. Now, instead of calling the default IPH that terminates the process, Python's _silent_invalid_parameter_handler gets called, which of course does nothing by design:

        >>> import time
        >>> time.strftime('A%')
        Breakpoint 0 hit
        python35!_silent_invalid_parameter_handler:
        00000000`5eadae50 c20000          ret     0
        0:000> k8
        Child-SP          RetAddr           Call Site
        00000000`002af018 000007fe`e9d8d2ab python35!_silent_invalid_parameter_handler
        00000000`002af020 000007fe`e9d8d2c9 ucrtbase!invalid_parameter+0x103
        00000000`002af060 000007fe`e9dafedc ucrtbase!invalid_parameter_noinfo+0x9
        00000000`002af0a0 000007fe`e9dac359 ucrtbase!Wcsftime_l+0x168
        00000000`002af140 000007fe`e9dac3f5 ucrtbase!Strftime_l+0x155
        00000000`002af1d0 00000000`5e9fc785 ucrtbase!strftime+0x15
        00000000`002af210 00000000`5ea7d5c2 python35!time_strftime+0x1f5
        00000000`002af2b0 00000000`5eaf8fd0 python35!PyCFunction_Call+0x122
    larryhastings commented 9 years ago

    Rather than debating about how various platforms handle malformed format strings for strftime(), and whether or not they crash, we should simply prevent the native strftime() function from seeing them in the first place.

    I'd like the "v3" patch in 3.5.0rc3. Can someone create a pull request?

    eryksun commented 9 years ago

    Rather than debating about how various platforms handle malformed format strings for strftime(), and whether or not they crash, we should simply prevent the native strftime() function from seeing them in the first place.

    Of course the check needs to be restored. I wasn't suggesting to go back to the default IPH to have it terminate the process. That won't help, anyway. The potential over-read is in the for loop, before calling strftime. I just thought the context for the accidental removal was relevant. Care needs to be taken when removing checks that were put in place to avoid the default IPH, because those checks may serve a dual purpose in the surrounding code.

    The patch also fixes the AIX/Sun code, which needs to be applied to 3.4 as well. I don't think this issue applies to 3.2 and 3.3.

    zooba commented 9 years ago

    I'll do the PR.

    zooba commented 9 years ago

    Rather than debating about how various platforms handle malformed format strings for strftime(), and whether or not they crash, we should simply prevent the native strftime() function from seeing them in the first place.

    The crash is nothing to do with the native strftime() function - Python was crashing because it *tried* to prevent malformed strings and we (I) got it wrong. If we were simply passing the string through unchecked this would not have been an issue (or it would have been an issue against the CRT).

    PR at https://bitbucket.org/larry/cpython350/pull-requests/18/issue-24917-time_strftime-buffer-over-read/diff

    larryhastings commented 9 years ago

    Pull request accepted. Please forward-merge. Thanks!

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

    New changeset 09b62202d9b7 by Steve Dower in branch '3.5': Issue bpo-24917: time_strftime() Buffer Over-read. Patch by John Leitch. https://hg.python.org/cpython/rev/09b62202d9b7

    New changeset 8b81b7ad2d0a by Steve Dower in branch '3.5': Issue bpo-24917: Moves NEWS entry under Library. https://hg.python.org/cpython/rev/8b81b7ad2d0a

    New changeset a29b49d57769 by Steve Dower in branch '3.4': Issue bpo-24917: time_strftime() Buffer Over-read. Patch by John Leitch. https://hg.python.org/cpython/rev/a29b49d57769

    New changeset 6fc744ac3953 by Steve Dower in branch 'default': Issue bpo-24917: time_strftime() Buffer Over-read. Patch by John Leitch. https://hg.python.org/cpython/rev/6fc744ac3953

    zooba commented 9 years ago

    I applied the AIX fix to 3.4 (introduced in bpo-19634). Python 3.2 and 3.3 aren't affected.

    larryhastings commented 9 years ago

    The tests from this patch fail on Linux.

    ----- First: There is no trailing % test on Linux, and glibc's strftime() happily ignores a trailing %, so no ValueError is raised.

    Python should do either one or the other of the following:

    1) Python should enforce no trailing % in the strftime format string, or

    2) the test suite shouldn't assume that a trailing % in the strftime value string raises a ValueError.

    I can live with either of these, not sure what the right decision is.

    ----- Second: The test from the patch assumes that strftime('%#') will raise a ValueError. Again, strftime in Linux glibc happily accepts "%#" as a format string and thus no ValueError is raised.

    Python is agnostic about native format units in the strftime() format string. Therefore I strongly assert that Python must not assume that "%#" is an illegal format string. Therefore the tests must not assume that "%#" raises ValueError.

    Given that the code used to crash, I do want the code path exercised in the test suite. So I propose that the test attempt time.strftime('%#') and accept either success or ValueError.

    -----

    Given that I've accepted this patch into 3.5.0, and it's now blocking my release, it is implicitly a "release blocker". I need to resolve this tonight before I can tag 3.5.0rc3. I'm going to dinner, and maybe we can have a quick discussion and come to a decision in the next hour or two.

    p.s. The checkin also flunked PEP-7. *sigh*

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    Is there a way to see what style guidelines have been violated? The only thing I can think of is the curly braces in the Windows check, but I was following the conventions of the surrounding code.

    larryhastings commented 9 years ago

    The starting curly brace goes on the same line as the statement starting the block. Keywords followed by a left parenthesis get a space between the keyword and the parenthesis. It's a small matter, I'm really much more interested in reconciling the behavior of strftime and its tests.

    larryhastings commented 9 years ago

    Sorry, maybe you inherited those violations. I was in a hurry and not in a charitable frame of mind.

    zooba commented 9 years ago

    I'll fix the PEP-7 stuff (mostly inherited).

    MSVC can also handle '%#' and '%' as format strings (producing '' in both cases). If that matches libc behaviour, I see no reason to raise a ValueError here apart from consistency with previous Python releases.

    If consistency with system C libraries is preferred, then I'll change the condition to simply break out of the loop and make the test call the function expecting success. I assume there's an AIX buildbot we can watch to see the impact on that part?

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    Yikes--your comment prompted me to look at the check-in, and it seems my patch wasn't properly applied. The curly braces got tweaked, which is minor as you stated, but more importantly the AIX code should not decref format. That could introduce problems bigger than what this patch was attempting to fix.

    And, not to dwell, but where do you see a keyword immediately followed by a left parens? I want to make sure everything is properly polished in the future, and the only thing I see is the untouched "for".

    Regarding your initial concerns: 1) I think we should enforce no trailing % so as to not pass format strings that may cause undefined behavior.

    2) How about expecting ValueError on Windows/AIX, and pass on all other platforms?

    zooba commented 9 years ago

    Whoops, must have done a bad copy-paste to get that DECREF in there (I couldn't apply the patch directly because it didn't come from an HG repo, so I had to do it by hand).

    MSVC seems somewhat inconsistent about its response:
    >>> strftime('aaa%')
    ''
    >>> strftime('aaaa%')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Invalid format string

    How closely do we want to simply expose the underlying C library's behaviour?

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    If it's so wildly inconsistent, it's my opinion that Python should perform its own validation to achieve better cross-platform support. The alternative is playing a never ending game of whack-a-mole, or just accepting that format strings may cause exceptions in some platforms but not others.

    zooba commented 9 years ago

    Having now read over this whole issue, I don't actually see where the security vulnerability is (on Windows at least).

    This is a user-mode read, so it can only access memory in the same process, and it doesn't display it anywhere. The worst that can happen is that it hits an unreadable page and crashes, which falls under "undefined behaviour due to invalid input".

    I think we should just revert the patch completely.

    larryhastings commented 9 years ago

    I have convinced myself that disallowing trailing % marks on all platforms is the right thing to do. Whether or not the underlying platform tolerates it, it is never *valid* input to strftime.

    I have a patch incubating at home. I'll put it up for review when I get back, maybe an hour.

    zooba commented 9 years ago

    I'll get my commits out of the way to make the buildbots green again.

    larryhastings commented 9 years ago

    Oh, maybe it was all like that before. Sorry, I was in a hurry and not in a charitable frame of mind.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

    New changeset 73227e857d6b by Steve Dower in branch '3.5': Issue bpo-24917: Backed out changeset 09b62202d9b7 https://hg.python.org/cpython/rev/73227e857d6b

    New changeset 743924064dc8 by Steve Dower in branch 'default': Issue bpo-24917: Backed out changeset 09b62202d9b7 https://hg.python.org/cpython/rev/743924064dc8

    zooba commented 9 years ago

    Yeah, I came in late and in too much of a rush too :(

    Backed out the broken changes, and now I'm off to bed. Not convinced there's anything that needs fixing here, and IIRC we argued through the different platform behaviours on the issue when I removed the null check that was preventing the overread.

    If we do anything now, it should just be if (outbuf[1] == '\0') break to avoid the overread. strftime() is probably too far gone to make it consistent across platforms now.

    8d430a93-5e49-44a5-93eb-a0a4492eab8e commented 9 years ago

    Yes, this is a user-mode read, but I disagree with the assertion that it's not possible to use this to disclose memory. While it isn't as critical as something that outright dumps memory, there is logic that throws exceptions based on values encountered while reading outside the bounds of the buffer. This could be used as a channel to infer what is or isn't in adjacent memory. That it's user-mode doesn't matter--if an application exposes the format string as attack surface, suddenly process memory can be probed. So, it's not heartbleed, but it does have security implications. If you'd like, I can take a shot at building a PoC.

    Further, it's best to err on the side of caution with bugs like these; just because it doesn't seem like major issue now doesn't mean someone won't come along in the future and prove otherwise.

    larryhastings commented 9 years ago

    I'm going to back this out of 3.5.0rc3. I'm still willing to discuss accepting it into 3.5.0 final in some form, but for now I don't want to hold up the release.

    Steve: it should never be possible to crash the Python interpreter with well-formed Python code. Whether or not it's possible to exploit it, I do think this crash is something the interpreter should prevent.

    eryksun commented 9 years ago

    MSVC seems somewhat inconsistent about its response: >>> strftime('aaa%') ''

    That's not due to MSVC. It's setting errno to EINVAL. The problem is that time_strftime is testing (buflen > 0 || i >= 256 fmtlen). The initial value of the outbuf size i is 1024, so when (fmtlen \<= 4), the value of (256 fmtlen) is less than or equal to i, and it assumes "the format yields an empty result" without considering the value of errno. So instead of raising an exception for EINVAL, it calls PyUnicode_DecodeLocaleAndSize to return an empty string:

        >>> strftime('aaa%')
        Breakpoint 1 hit
        ucrtbase!strftime:
        000007fe`e2bac3e0 4883ec38        sub     rsp,38h
        0:000> gu
        python35!time_strftime+0x1f5:
        00000000`5de9c785 488bcb          mov     rcx,rbx
        0:000> be 0; g
        Breakpoint 0 hit
        ucrtbase!_errno:
        000007fe`e2b341b0 48895c2408      mov     qword ptr [rsp+8],rbx ss:00000000`0028f320=ffffffffffffffff

    errno is 22:

    0:000> pt; dd @rax l1
    00000000`002ddb50  00000016
    0:000> bd 0; g
    Breakpoint 2 hit
    python35!PyUnicode_DecodeLocaleAndSize:
    00000000`5df55070 4053            push    rbx
    0:000> k 2
    Child-SP          RetAddr           Call Site
    00000000`0028f318 00000000`5de9c81a python35!PyUnicode_DecodeLocaleAndSize
    00000000`0028f320 00000000`5df1d5c2 python35!time_strftime+0x28a
    0:000> g
    ''
    zooba commented 9 years ago

    Larry: agreed, no crashing. We'll get the trivial fix in that doesn't raise any error, and should probably stop suppressing the exception in the situation eryksun describes. That will correctly expose the platform's strftime semantics.

    Unless you want to write more thorough validation of incoming strings, I'll work something up later today or tomorrow.

    zooba commented 9 years ago

    New patch attached that just breaks out of the scanning loop and lets the system CRT handle invalid format strings. Fixes the condition that was suppressing some errors on Windows.

    Also fixed the PEP-7 issues around the changed code (I believe). No new test because the crash cannot be reliably reproduced and the expected results vary dramatically between platforms.

    The two if/break lines in the AIX branch should also be applied to 3.4 to prevent crashes. The rest of the fix (including the PEP-7 fixes) should not.