jimbaker / tagstr

This repo contains an issue tracker, examples, and early work related to PEP 999: Tag Strings
51 stars 6 forks source link

Rewriting expressions, using their lexical scope #10

Closed jimbaker closed 1 year ago

jimbaker commented 2 years ago

Rewriting expressions could support the popular numexpr library, which rewrites expressions on NumPy arrays to use its own bytecode interpreter for vectorized ops. Because this rewriting uses sys._getframe, it necessarily is using dynamic scope instead of lexical scope, which exposes that sharp edge to users who might expect to be able to use standard lexical scoping.

I have implemented the core of such support in https://gist.github.com/jimbaker/619de23e34a28affc14e4fdfb1b99996. It supports a new tag, numexpr, which returns for each thunk a new thunk with a code object that returns a mapping of variable names to their values. (Arguably the ergonomics of this rewriting is that it should return the old raw code text instead of what has been wrapped.)

The key piece here in the gist is this wrapping:

    wrapped = f"""
def outer({param_list(code.co_freevars)}):
    def inner():
        return \\
{textwrap.indent(name_bindings(all_names), '   ' * 3)}
"""

which implements part of what is done in lambda lifting, namely "eliminating free variables in the function by adding parameters." So that's directly translated to, put the co_freevars in the outer function. (This is a variation - we still need to keep the inner function with the same params so the code object that is compiled conforms to the old code object, and therefore can be used in the same way.) Python's scope compilation then works as usual - global variables and free variables are appropriately analyzed and corresponding code is generated. For CPython specifically, we can see this in the emitted bytecode, with comparable LOAD_DEREF and LOAD_GLOBAL to what is in the original lambda-wrapped expression.

To demonstrate how it could be used, I'm thinking of showing how to construct NumExpr3's NumExpr directly from such a tag (ignoring some subtleties, such as indicating an out variable). This will be added as an example tag in the examples folder.

jimbaker commented 2 years ago

See https://github.com/jimbaker/tagstr/blob/main/examples/rewrite.py

gvanrossum commented 2 years ago

I'm sorry, but this seems too esoteric to be a motivational example. Certainly if you want this to be considered motivational you'd have to explain it to someone who is not already familiar with lambda lifting, numexpr (whose docs are just weird), and "rewriting" (rewriting how?).

jimbaker commented 2 years ago

Yeah, this one is very powerful, and interesting to PL nerds like myself, but definitely esoteric. I agree that it's outside the scope of a tutorial, even with respect to the NumExpr library. Having said that, I will try to complete this example here to show it could be used there.

The fact that we lambda-wrap is still useful for other usages, especially lazy f-strings. Another possible example is constructing logging's LogRecord instances directly.

jimbaker commented 2 years ago

There's now another way to bind the free variables in an expression, without doing the outer function trick. Not surprisingly, this is necessary for PEP 649 - https://github.com/python/cpython/pull/92204

This is a good reason to update the branch we are working against, since it was pushed into main just after @gvanrossum started the work in #1 - the usual activity around PyCon sprints.

It was also mentioned in this recent thread that David Mertz started, https://mail.python.org/archives/list/python-ideas@python.org/thread/DQTR3CYWMLSRRKR6MBLZNTGCG762QNDF/#DQTR3CYWMLSRRKR6MBLZNTGCG762QNDF , which is where I became aware of it.

It's likely that the rewrite using a lambda-wrapped expression that captures its lexical scope + its source (so two elements of Thunk, getvalue and raw) is the same idea that's necessary for an implementation of later that can could enable some form of "execution plan rewrite". In particular, the Dask example looks roughly similar at a high level to what is done with the NumExpr library, but extended to an expression graph in Python code.

jimbaker commented 1 year ago

This is interesting, but not relevant to the main work on tag string.