python / cpython

The Python programming language
https://www.python.org
Other
63.54k stars 30.45k forks source link

The "lazy strings" patch #44194

Closed larryhastings closed 14 years ago

larryhastings commented 18 years ago
BPO 1590352
Nosy @pitrou, @larryhastings, @tiran, @devdanzin
Superseder
  • bpo-1569040: Speed up using + for string concatenation
  • Files
  • python.lch.lazy.string.patch.52618.diff: Patch against 2.6 trunk, revision 52618.
  • lazy.strings.patch.monograph.txt: An in-depth description of the patch and its ramifications, as of revision 52618.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'type-feature'] title = 'The "lazy strings" patch' updated_at = user = 'https://github.com/larryhastings' ``` bugs.python.org fields: ```python activity = actor = 'eric.araujo' assignee = 'none' closed = True closed_date = closer = 'eric.araujo' components = ['Interpreter Core'] creation = creator = 'larry' dependencies = [] files = ['7603', '7604'] hgrepos = [] issue_num = 1590352 keywords = ['patch'] message_count = 7.0 messages = ['51321', '51322', '51323', '51324', '58745', '84583', '84588'] nosy_count = 6.0 nosy_names = ['collinwinter', 'pitrou', 'larry', 'christian.heimes', 'ajaksu2', 'paulhankin'] pr_nums = [] priority = 'high' resolution = 'out of date' stage = 'resolved' status = 'closed' superseder = '1569040' type = 'enhancement' url = 'https://bugs.python.org/issue1590352' versions = ['Python 2.7'] ```

    larryhastings commented 18 years ago

    This patch consists of three changes to CPython:

    In detail:

    PyStringObject.ob_sval was changed from a char[] array to a char *. This is not in and of itself particularly desirable. It was necessary in order to implement the other two changes.

    "lazy concatenations" change string concatenation ("a" + "b") so that, instead of directly calculating the resulting string, it returns a placeholder object representing the result. As a result, string concatenation in CPython is now more than 150% faster on average (as reported by pystone 2.0), and is approximately as fast as the standard string concatenation idiom ("".join([a + b + c])).

    "lazy slices" changes string slicing ("abc"[1], "a".strip()) so that, instead of directly calculating the resulting string, it returns a placeholder object representing the result. As a result, string slicing in CPython is now more than 60% faster on average (as reported by pystone 2.0).

    When considering this patch, please keep in mind that the "lazy" changes are distinct, and could be incorporated independently. In particular I'm guessing that "lazy concatenations" have a lot higher chance of being accepted than "lazy slices".

    These changes were implemented almost entirely in Include/stringobject.h and Objects/stringobject.c.

    With this patch applied, trunk builds and passes all expected tests on Win32 and Linux.

    For a more thorough discussion of this patch, please see the attached text file(s).

    9ba7937e-c1fe-412d-bc01-ff013fe3cf13 commented 17 years ago

    I really like the idea of the lazy cats, and can believe that it's a really good optimisation, but before I review this code properly I'd like to see: a. convincing that it doesn't break strict aliasing (a casual reading suggests it does) b. lazy slices removed into their own patch (or just removed) - I don't want to recommend a patch containing them c. adherence to coding standard d. a little more explanation of how the cat objects work: it's important because they're a future minefield of bugs.

    larryhastings commented 17 years ago

    Howdy! Much has transpired since I posted this patch.

    Is there life after Guido patch-closing? I'd be happy to spend the time answering your questions if my patch had some sort of future. (Though you'll have to tell me what you mean by "break strict aliasing".)

    9ba7937e-c1fe-412d-bc01-ff013fe3cf13 commented 17 years ago

    Hi Larry, It doesn't sound too promising - I'm new and have no powers of resurrection :(

    By strict aliasing, I just meant it's illegal to access members of one type if the object is of a different (incompatible) type (actually I was wrong, this isn't the strict aliasing rule - it's a more fundamental one). In your case, it means it's illegal to pass a concat object where a string object is expected, even if the function accesses members that are common to them both. If this is happening, the answer is to make a union with the string object and cat object as members, and to use this union type instead but it's not pretty.

    I suggest this patch is closed anyway. If you still believe in your code and think that lazy string cats have support, I suggest making a new patch with just those in (fixed up to be correct C, and PEP-7 compliant).

    tiran commented 16 years ago

    I'm raising the level to draw more attention to this featue. It should either be considered for inclusion or closed.

    90baf024-6604-450d-8341-d796fe6858f3 commented 15 years ago

    ISTM that 2.x won't receive this kind on enhancement anymore.

    Collin, I'm adding you to the nosy list because you may be interested in having this (either on unladen or CPython). If so, also take a look at bpo-1569040.

    pitrou commented 15 years ago

    Either this bug or bpo-1569040 should be closed as duplicate of the other (it's really the same approach by the same author at two different times :-)).