python / cpython

The Python programming language
https://www.python.org
Other
62.39k stars 29.96k forks source link

Accelerate 'string' % (value, ...) by using formatted string literals #72494

Open serhiy-storchaka opened 7 years ago

serhiy-storchaka commented 7 years ago
BPO 28307
Nosy @vstinner, @taleinat, @ericvsmith, @markshannon, @serhiy-storchaka, @ztane, @brandtbucher
PRs
  • python/cpython#5012
  • python/cpython#26160
  • python/cpython#26318
  • Dependencies
  • bpo-11549: Build-out an AST optimizer, moving some functionality out of the peephole optimizer
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/serhiy-storchaka' closed_at = None created_at = labels = ['interpreter-core', '3.11', 'performance'] title = "Accelerate 'string' % (value, ...) by using formatted string literals" updated_at = user = 'https://github.com/serhiy-storchaka' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'serhiy.storchaka' closed = False closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'serhiy.storchaka' dependencies = ['11549'] files = [] hgrepos = [] issue_num = 28307 keywords = ['patch'] message_count = 9.0 messages = ['277688', '277694', '277700', '277702', '277703', '309049', '324795', '393740', '402436'] nosy_count = 7.0 nosy_names = ['vstinner', 'taleinat', 'eric.smith', 'Mark.Shannon', 'serhiy.storchaka', 'ztane', 'brandtbucher'] pr_nums = ['5012', '26160', '26318'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'performance' url = 'https://bugs.python.org/issue28307' versions = ['Python 3.11'] ```

    serhiy-storchaka commented 7 years ago

    For now using formatted string literals (PEP-498) is the fastest way of formatting strings.

    $ ./python -m perf timeit -s 'k = "foo"; v = "bar"' -- '"%s = %r" % (k, v)'
    Median +- std dev: 2.27 us +- 0.20 us
    
    $ ./python -m perf timeit -s 'k = "foo"; v = "bar"' -- 'f"{k!s} = {v!r}"'
    Median +- std dev: 1.09 us +- 0.08 us

    The compiler could translate C-style formatting with literal format string to the equivalent formatted string literal. The code '%s = %r' % (k, v) could be translated to

        t1 = k; t2 = v; f'{t1!r} = {t2!s}'; del t1, t2

    or even simpler if k and v are initialized local variables.

    $ ./python -m perf timeit -s 'k = "foo"; v = "bar"' -- 't1 = k; t2 = v; f"{t1!s} = {t2!r}"; del t1, t2'
    Median +- std dev: 1.22 us +- 0.05 us

    This is not easy issue and needs first implementing the AST optimizer.

    ericvsmith commented 7 years ago

    There isn't a direct mapping between %-formatting and __format__ format specifiers. Off the top of my head, I can think of at least one difference:

    >>> '%i' % 3
    '3'
    >>> '{:i}'.format(3)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Unknown format code 'i' for object of type 'int'

    So you'll need to be careful with edge cases like this.

    Also, for all usages of %s, remember to call str() (or add !s):

    >>> '%s' % 1
    '1'
    >>> f'{1:s}'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Unknown format code 's' for object of type 'int'
    >>> f'{1!s:s}'
    '1'
    
    Although that also reminds me of this default alignment difference:
    >>> x=0
    >>> '%2s' % x
    ' 0'
    >>> f'{x!s:2s}'
    '0 '
    >>> f'{x!s:>2s}'
    ' 0'

    So, in general, the mapping will be difficult. On the other hand, if you can do it, and provide a function that maps between %-formatting codes and __format__ codes, then that might be a generally useful tool.

    serhiy-storchaka commented 7 years ago

    '%s' % x should be translated to f'{x!s}', not to f'{x:s}'. Only %s, %r and %a can be supported. Formatting with %i should left untranslated. Or maybe translate '%r: %i' % (a, x) to f'{a!r}: {"%i" % x}'.

    It is possible also to introduce special opcodes that converts argument to exact int or float. Then '%06i' % x could be translated to f'{__exact_int__(x):06}'.

    ff59cd45-ebe3-4b3e-9696-65dc59a38b8c commented 7 years ago

    Serhiy, you actually did make a mistake above; '%s' % x cannot be rewritten as f'{x!s}', only '%s' % (x,) can be optimized...

    (just try with x = 1, 2)

    serhiy-storchaka commented 7 years ago

    Thanks for the correction Antti. Yes, this is what I initially meant. This optimization is applicable only if the left argument of % is a literal string and the right argument is a tuple expression. Saying about '%s' % x I meant a component of the tuple.

    serhiy-storchaka commented 6 years ago

    PR 5012 implements transformation simple format strings containing only %s, %r and %a into f-strings.

    taleinat commented 6 years ago

    I'm +1 on this optimization.

    serhiy-storchaka commented 3 years ago

    PR 26160 adds support of %d, %i, %u, %o, %x, %X, %f, %e, %g, %F, %E, %G.

    What is not supported:

    vstinner commented 2 years ago

    commit a0bd9e9c11f5f52c7ddd19144c8230da016b53c6 Author: Serhiy Storchaka \storchaka@gmail.com\ Date: Sat May 8 22:33:10 2021 +0300

    bpo-28307: Convert simple C-style formatting with literal format into f-string. (GH-5012)
    
    C-style formatting with literal format containing only format codes
    %s, %r and %a (with optional width, precision and alignment)
    will be converted to an equivalent f-string expression.
    
    It can speed up formatting more than 2 times by eliminating
    runtime parsing of the format string and creating temporary tuple.

    commit 8b010673185d36d13e69e5bf7d902a0b3fa63051 Author: Serhiy Storchaka \storchaka@gmail.com\ Date: Sun May 23 19:06:48 2021 +0300

    bpo-28307: Tests and fixes for optimization of C-style formatting (GH-26318)
    
    Fix errors:
    * "%10.s" should be equal to "%10.0s", not "%10s".
    * Tuples with starred expressions caused a SyntaxError.