python / cpython

The Python programming language
https://www.python.org
Other
63.06k stars 30.2k forks source link

sre "bytecode" verifier #47737

Closed gvanrossum closed 16 years ago

gvanrossum commented 16 years ago
BPO 3487
Nosy @gvanrossum, @warsaw, @terryjreedy, @gpshead, @jcea, @pitrou
Files
  • sre-verifier.diff: patch relative to Python 2.6
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/gvanrossum' closed_at = created_at = labels = [] title = 'sre "bytecode" verifier' updated_at = user = 'https://github.com/gvanrossum' ``` bugs.python.org fields: ```python activity = actor = 'jcea' assignee = 'gvanrossum' closed = True closed_date = closer = 'gvanrossum' components = [] creation = creator = 'gvanrossum' dependencies = [] files = ['11030'] hgrepos = [] issue_num = 3487 keywords = ['patch'] message_count = 7.0 messages = ['70574', '70596', '70606', '70682', '70692', '70695', '70729'] nosy_count = 6.0 nosy_names = ['gvanrossum', 'barry', 'terry.reedy', 'gregory.p.smith', 'jcea', 'pitrou'] pr_nums = [] priority = 'high' resolution = 'accepted' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue3487' versions = ['Python 2.6', 'Python 3.0'] ```

    gvanrossum commented 16 years ago

    Attached is a verifier for the binary code used by the _sre module (this is often called bytecode, though to distinguish it from Python bytecode I put it in quotes).

    I wrote this for Google App Engine, and am making the patch available as open source under the Apache 2 license. Below are the copyright statement and license, for completeness.

    Barry, I'm assigning this to you only so that you can decide whether this is okay to check in before beta3. The patch works for 2.6 as well as for 3.0 (and even for 2.5, but I don't think it's worth changing that -- or is it?). If you agree (which I hope you will do), please assign back to me with instructions and I'll submit it. The performance impact is minimal: it only runs when the compiled "bytecode" is passed from the re compiler (written in Python) to the C code, during the sre compilation stage. All tests pass. We've had this code in use in App Engine since April without complaints.

    # Copyright 2008 Google Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License.

    It's not necessary to include these copyrights and bytecode in the source file. Google has signed a contributor's agreement with the PSF already.

    terryjreedy commented 16 years ago

    Based on my understanding of the above and PyDev discussions, I see the arguments in favor of quick inclusion as being the following:

    1. This will be user invisible, so it is not a new interface feature.
    2. This will prevent possible interpreter crashes or other bad behavior as a result of malformed SREcode. So it could be considered a bug fix.
    3. Google considered this enough of a potential problem to pre-emptively fix it. Now that that problem has been publicly exposed, other careful users will expect it to be fixed and will find Python more attractive when it has been.

    If this is included in the next betas, the announcement of such might say so and encourage re users to re-run any re-based test code.

    gvanrossum commented 16 years ago
    1. Google considered this enough of a potential problem to pre-emptively fix it. Now that that problem has been publicly exposed, other careful users will expect it to be fixed and will find Python more attractive when it has been.

    If this is included in the next betas, the announcement of such might say so and encourage re users to re-run any re-based test code.

    I should add that the protection this offers is against attempts to cause crashes by passing bad RE "bytecode" into the _sre.compile().

    It is not possibly to generate such bad RE "bytecode" by writing an evil regular expression; you must have access to the _sre module in order to be able to exploit this vulnerability. In other words, the vulnerability is equivalent to having ctypes accessible.

    Thus, only people who are worried about malicious use of ctypes should be worried about this vulnerability. Google's App Engine is one of those (rare) places, since it lets anybody run their Python code in a Google datacenter. If you offer the ability to run arbitrary Python code to strangers, you should worry about this. Otherwise, there is no reason to worry.

    gpshead commented 16 years ago

    +1 I'd like to see this make it in.

    pitrou commented 16 years ago

    Shouldn't there be any unit tests? :)

    warsaw commented 16 years ago

    Guido, this is fine for 3.0 and 2.6. As Terry points out, it's not user visible and it improves reliability. I'm -0 on backporting it to 2.5, but don't really feel strongly about that.

    Go for it!

    gvanrossum commented 16 years ago

    Submitted to 2.6 as r65544.

    Will propagate to 3.0 as it gets merged -- should be a perfect merge.

    Antoine: the re module has tons of unittests; showing that attempts to break in are thwarted would be pretty boring. ;-)