jimbaker / tagstr

This repo contains an issue tracker, examples, and early work related to PEP 999: Tag Strings
51 stars 6 forks source link

`fl` tag - Supporting lazy version of f-strings #6

Closed jimbaker closed 1 year ago

jimbaker commented 2 years ago

Building on @gvanrossum's issue https://github.com/jimbaker/tagstr/issues/1, I have implemented a lazy version of f-strings. This demo code implements the fl tag such that it has the same user behavior as described in https://github.com/python/cpython/issues/77135

Note that while fltag.py in the gist implements memoization on the value, it's not been optimized in any other way. (This was an imporant part of the discussion in the CPython issue above.) My assumption here is that an implementation of the fl tag could do some memoization on the raw string decode, as well other optimizations. TBD.

This can be used as follows, as seen in the demo function in the gist:

    logging.info(fl'{expensive_fn()}')     # nothing logged, expensive_fn is not called
    logging.warning(fl'{expensive_fn()}')  # but this is

https://gist.github.com/jimbaker/bb27803755ce890ecbcae29927cb776e

gvanrossum commented 2 years ago

Awesome! I see you discovered that format(x, '') is equivalent to str(x). Nice! (I had forgotten.)

You also discovered the need for decoding "raw" strings. I'm not sure that your approach covers everything. The escapes I know of include:

gvanrossum commented 2 years ago

Maybe we can similarly implement PEP 501 style "i-strings" using tag strings?

jimbaker commented 2 years ago

You also discovered the need for decoding "raw" strings. I'm not sure that your approach covers everything. The escapes I know of include:

  • \n and other classic one-letter escapes
  • \ooo octal
  • \xhh hex
  • \uhhhh 16-bit unicode
  • \Uhhhhhhhh 32-bit unicode (really 21-bit)
  • \N{unicode_name}
  • What did I miss?

We should be good here - nothing missed. The function _PyUnicode_DecodeUnicodeEscapeInternal in unicodeobject.c is used by both the parser (directly) and the codec supporting 'unicode-escape', and it implements the above escape logic.

So trying it out with the fl tag, along with using standard raw f-strings:

>>> fr'\N{{GRINNING FACE}}'
'\\N{GRINNING FACE}'
>>> fl'\N{{GRINNING FACE}}'
LazyFString(args=('\\N{GRINNING FACE}',))
>>> str(_)
'😀'

The only possible gotcha here is that in regular f-strings we don't double up braces for \N{unicode_name} (they are not parsed as being part of an expression). For raw f-strings/tag-strings, it results in perhaps an obvious SyntaxError with GRINNING FACE, less so with something else where it would be a valid name:

>>> f'\N{GRINNING FACE}'
'😀'
arcivanov commented 2 years ago

@jimbaker this is wonderful progress! Perhaps I'm missing something but this code isn't compiling, obviously, on 3.10.4. Are you using a patched version of CPython?

gvanrossum commented 2 years ago

@arcivanov See #1. There's a branch of 3.11 involved, this is just a prototype. :)

@jimbaker Ah, cool. I keep forgetting new things. :-)

The issue with braces in tag"\N{unicode name}" is unfortunate, I guess it would be gone if we didn't support raw strings. Why do we need those again? Maybe we don't?

ericsnowcurrently commented 2 years ago

Raw strings are especially useful for regular expressions. I know of at least one case where tag strings would be useful for regular expressions: fixing indentation in a verbose pattern where there are interpolations that "include" subpatterns defined elsewhere.

gvanrossum commented 2 years ago

Raw strings are especially useful for regular expressions. I know of at least one case where tag strings would be useful for regular expressions: fixing indentation in a verbose pattern where there are interpolations that "include" subpatterns defined elsewhere.

Could you elaborate on that example? IIUC in verbose re patterns indentation doesn't matter. So why would you need to fix it?

ericvsmith commented 2 years ago

I think that if our choice only "raw" or "cooked", we should go with raw. If we could come up with a clever way to say "this is a raw tagged string", then that would be ideal. But I don't see a good way of doing that. Maybe "fl-r", for a raw "fl" string? But it seems too ugly.

gvanrossum commented 2 years ago

I think that if our choice only "raw" or "cooked", we should go with raw. If we could come up with a clever way to say "this is a raw tagged string", then that would be ideal. But I don't see a good way of doing that. Maybe "fl-r", for a raw "fl" string? But it seems too ugly.

Definitely too ugly. :-)

I guess the tag"\N{blah blah}" issue can be solved in the f-string parser if there's enough motivation -- it "just" has to recognize \N{...} and not turn it into an interpolation. (This requires knowing whether the \ is itself escaped -- but it should already be keeping track of that in order to know when \" ends the string.)

ericsnowcurrently commented 2 years ago

Could you elaborate on that example? IIUC in verbose re patterns indentation doesn't matter. So why would you need to fix it?

It really helps when debugging a large pattern. If the indentation of "included" sub-patterns isn't fixed, then the resulting pattern is harder to follow when you print it out. I've had to deal with this on occasion.

jimbaker commented 2 years ago

\LaTeX support 😁 - it would be nice to generate Latex with Python templates. This has been a use case for me, and likely a future one.

gvanrossum commented 2 years ago

Okay, raw mode seems useful enough to support.

gvanrossum commented 2 years ago

But wait. Aren’t curlies just as prevalent in Latex as backslashes? So what would you gain?

jimbaker commented 2 years ago

But wait. Aren’t curlies just as prevalent in Latex as backslashes? So what would you gain?

There are a lot of metacharacters in Latex. But there's a difference between working with something balanced like {{...}} and doubling \\. Or worse doubling separators with \\\\. Also in practice - or at least what I have done - there are more symbols that are specified simply by \sym than being parameterized. Again, this is for generated Latex, not writing it in general.

There is this workaround for Jinja. I don't think the fact that it can be customized actually helps here: http://eosrei.net/articles/2015/11/latex-templates-python-and-jinja2-generate-pdfs

rmorshea commented 2 years ago

This reminds me of something i made because I wanted f-string style templates.

jimbaker commented 2 years ago

@rmorshea The need to do frame inspection (as in https://github.com/rmorshea/fstr/blob/master/fstr/fstr.py#L33) is a common requirement we see in other templating approaches where we want to avoid repeating oneself and have direct access to expressions - which is perhaps why f-strings have become so popular. See for example

(I'm sure there are many, many more examples out there!)

The sharp edge here is that sys._getframe (however it is wrapped) provides access to the dynamic scope of a given name. This can certainly be useful, but lexical scope is what f-string uses - and tag strings more generally. In particular, the lack of lexical scope is why more complex - but still very much popular - usage patterns fail, as seen in this issue: https://github.com/jviide/htm.py/issues/11

jimbaker commented 2 years ago

I added fl.py to the examples in the repo - this is a cleaned up version of the original gist.

rmorshea commented 2 years ago

I think if I were doing this now I'd drop .evaluate() method for the reason you mention in addition to the fact that it's a bit too "magical". Rather, I'd just want .format(**variables) as in the "Use f-string syntax instead of str.format" example.

A reworked version of the second example might be:

import fformat import fformat

common_error_message = fformat`function {function.__name__!r} failed because {error}`

def add(x, y):
    try:
        return x + y
    except Exception as e:
        msg = common_error_message.format(function=add, error=e)
        print(msg)

def sub(x, y):
    try:
        return x + y
    except Exception as e:
        msg = common_error_message.format(function=sub, error=e)
        print(msg)

add(1, "2")
sub("5", 3)

If this seems compelling I can take a crack at implementing it.

jimbaker commented 2 years ago

@rmorshea I've been thinking about the example with common_error_message, which also resembles the approach discussed in https://github.com/jimbaker/tagstr/issues/2

First, we can do something like what you propose, given that a tag string when evaluated simply returns some object, which could support a format method (or possibly, __format__).

My feeling however that we might just want to wrap with a function, much like we do with some of the nested tag strings, such as with what see in the html example. So this could work:

from fformat import fformat

def common_error_message(function, error):
    return fformat`function {function.__name__!r} failed because {error}`

def add(x, y):
    try:
        return x + y
    except Exception as e:
        msg = common_error_message(function, e)
        print(msg)
gvanrossum commented 2 years ago

Why couldn't that use f-strings?

jimbaker commented 2 years ago

@gvanrossum I'm assuming - perhaps wrongly! - that fformat is doing something that's not quite what f-strings would do. Otherwise, use the f-string or the equivalent format string.

@rmorshea Am I completely off with respect to the intent of the example?

rmorshea commented 2 years ago

The intention was simply to have a way to use f-string syntax in a re-usable template. So to @gvanrossum's question, the answer is yes and no. Yes, you can do this with an f-string in a function as @jimbaker showed above, but no, you can't do it only with f-strings. If you were to do this just with f-string's you'd need to copy-paste the f-string and substitute in the appropriate variables (e.g. switching out add for sub as the function in the msg):

def add(x, y):
    try:
        return x + y
    except Exception as e:
        msg = f`function {add.__name__!r} failed because {e}`
        print(msg)

def sub(x, y):
    try:
        return x - y
    except Exception as e:
        msg = f`function {sub.__name__!r} failed because {e}`
        print(msg)
gvanrossum commented 2 years ago

That's sort of what I figured. But if you have to write code that is essentially template.format(function=..., error=...) where template is a global (template = fformat"function {add.__name__!r} failed because {error}") then this really does beg the question of whether it isn't better to just use a function (to be called using the same signature as template.format())? If we're trying to sell the idea that tag strings enable use cases that we couldn't have before, this particular example makes for a pretty weak argument.

pomponchik commented 1 year ago

I also wrote my own implementation of lazy f-strings. It works even easier:

import f

number = 33
f('{number} kittens drink milk')

The actual calculation of the string followed by caching occurs at the first access. This happens transparently to the user and works, for example, for logging.

My argument for why this feature should still be built into the interpreter is speed. I don't see any ways to achieve a speed comparable to the original f-strings. It is especially expensive to extract variables from closures - it cannot be done as efficiently as the interpreter does inside itself.

jimbaker commented 1 year ago

@pomponchik right, it's quite possible to dynamically look up variables in a number of ways from Python's frames. One can also generalize to expressions. The numexpr package, which is part of NumPy, does something similar to your implementation; see https://github.com/pydata/numexpr/blob/master/numexpr/necompiler.py#L725 (The difference is that inspect.stack as used in fazy calls sys._getframe.)

One difference in this proposal, besides being faster (or should be), is that the lookups of any variables in such expressions are lexically scoped. This is a well-known problem when composing with respect to nested functions, or implicitly with list comprehensions/generator expressions; see @pauleveritt's comment on a similar library https://github.com/jviide/htm.py/issues/11

So that's why we need language support, or at least a transpiler, as @rmorshea as worked on in https://github.com/jimbaker/tagstr/issues/20

pomponchik commented 1 year ago

@jimbaker My implementation takes into account lexical nesting and takes these variables too, you can check. For generator expressions, my library also works perfectly:

>>> list(f('{x}') for x in range(10))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In general, I don't see anything that I fundamentally couldn't implement as a library in this case. There is no way to reliably do just one thing - to make it work quickly.

jimbaker commented 1 year ago

@pomponchik I tried it on a modified version of Paul's example, and it didn't blow up:

import f

todos = ['breakfast', 'lunch', 'dinner']                                                                                
def Todo(label, prefix):
    return f('<li>{prefix}{label}</li>')

def TodoList(prefix, todos):
    return f('<ul>{[Todo(label, prefix) for label in todos]}</ul>')                                                     

print(TodoList('item - ', todos))

So my first reaction is, Python is maintaining enough lexical information at runtime such that it is possible to recover the lexical scoped lookup, if with rather complex (and as you note, slow running) code. I would have to see if I can find a counterexample, but your test cases you linked certainly cover the obvious one.

Obviously if this is true., then it is possible to implement an arbitrary tag scheme with ordinary functions, similar to what was done in https://github.com/jviide/tagged, but respecting lexical scope.

Also I'm rather impressed with your code here. It does go deep into Python's internals!

gvanrossum commented 1 year ago

How would you enforce that f() isn't called with a variable argument? That would be an attack vector.

pomponchik commented 1 year ago

@gvanrossum I would do this by extracting the code object from the stack and analyzing its AST. The AST node must be a constant. So far, this improvement has not occurred to me, but in principle it looks like I could do it. But I note that it will work even longer.

gvanrossum commented 1 year ago

FWIW I don't want to keep discussing your f(...) implementation, it's irrelevant to the idea of tag strings (which are by design a syntactic feature).

jimbaker commented 1 year ago

We have a working example of fl tag, so closing out this issue.