jimbaker / tagstr

This repo contains an issue tracker, examples, and early work related to PEP 999: Tag Strings
51 stars 6 forks source link

Using a tuple instead of a Thunk type #5

Open gvanrossum opened 2 years ago

gvanrossum commented 2 years ago

What if instead of passing a Thunk object for every interpolation, we just pass a tuple? So

def tag(*args): return args

tag"pre{x+1!r:10s}post"

ends up calling

tag("pre", (lambda: x+1, "x+1", "r", "10s"), "post")

PRO: It's easier to generate the code for this (no C code needed that implements a Thunk type).

CON: Can't easily add another value to the tuple in the future (since people would write code that unpacks the tuple into exactly 4 variables).

jimbaker commented 2 years ago

For immediate development on this pre PEP work, using a tuple certainly makes sense.

For later: can we use a tuple for now, then convert to effectively to a PyObject version of a namedtuple? I believe this is what was done at some point for a bunch of the os types, such as StatResultType. Part of that involved widening of the tuple as well, https://docs.python.org/3/library/os.html#os.stat_result

It doesn't solve the unpacking con however.

gvanrossum commented 1 year ago

Here's a reasonably clean example from the time module.

I am reluctant to implement that; we'd probably need a new AST node type specifically for this purpose, and a new bytecode to construct the instance. E.g. this program:

import ast
print(ast.dump(ast.parse("tag'xxx{yyy!r:spec}zzz'"), indent=2))

prints the following AST:

Module(
  body=[
    Expr(
      value=TagString(
        tag=Name(id='tag', ctx=Load()),
        str=JoinedStr(
          values=[
            Constant(value='xxx'),
            Tuple(
              elts=[
                Lambda(
                  args=arguments(
                    posonlyargs=[],
                    args=[],
                    kwonlyargs=[],
                    kw_defaults=[],
                    defaults=[]),
                  body=Name(id='yyy', ctx=Load())),
                Constant(value='yyy'),
                Constant(value='r'),
                JoinedStr(
                  values=[
                    Constant(value='spec')])],
              ctx=Load()),
            Constant(value='zzz')])))],
  type_ignores=[])

The bytecode looks like this:

  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (tag)
              6 LOAD_CONST               0 ('xxx')
              8 LOAD_CONST               1 (<code object <lambda> at 0x6110000cb390, file "<>", line 1>)
             10 MAKE_FUNCTION            0
             12 LOAD_CONST               2 ('yyy')
             14 LOAD_CONST               3 ('r')
             16 LOAD_CONST               4 ('spec')
             18 BUILD_TUPLE              4
             20 LOAD_CONST               5 ('zzz')
             22 CALL                     3
             30 RETURN_VALUE

We'd need a new BUILD_THUNK opcode instead of the BUILD_TUPLE opcode at offset 18.

In theory we could defer adding the Thunk datatype until we need to add a 5th field (like we did for os.stat() and time.localtime() -- although that was long ago).

pauleveritt commented 1 year ago

I believe we've settled into a named tuple. We have one defined in this repo, but perhaps it should move over to Guido's branch and be part of CPython? If not, we should close this ticket.

If so, should we follow current practice and make a ThunkLike protocol? Not sure if anybody will have a place to insert alternative implementations.

gvanrossum commented 1 year ago

It would be good if the Thunk implementation didn't require inheritance from a stdlib Thunk class.

I am not in the mood to try and implement the Thunk class (and constructing instances from it in the bytecode) right now, we can work on that once the PEP is accepted.