injecting tokens in filter_stream fails with "expected token end of print statement"

Grollicus commented 11 months ago

I'm using an extension modifying the token stream via filter_stream to implement autoescape functionality for a latex renderer.

Extension looks like this:

class MyExtension(Extension):
    def filter_stream(self, stream):
        for token in stream:
            if token.type is TOKEN_VARIABLE_BEGIN:
                yield token
                yield Token(token.lineno, 'lparen', '(')
            elif token.type is TOKEN_VARIABLE_END:
                yield Token(token.lineno, 'rparen', ')')
                yield Token(token.lineno, 'pipe', '|')
                yield Token(token.lineno, 'name', 'escape_tex')
                yield token
            else:
                yield token

It pipes every variable through an escape_tex-function, for example {{ thingy }} gets translated to {{ (thingy) | escape_tex }}.

With jinja2==2.11.3 this works as intended. With newer versions, for example jinja2==3.1.2 I get this error:

jinja2.exceptions.TemplateSyntaxError: expected token 'end of print statement', got ')'

The full exception when disabling rewrite_traceback_stack indicates it crashes in parser.py:subparse.

Traceback (most recent call last):
  File "./main.py", line 35, in <module>
    print(env.from_string('Results in {{ foo + "asdf" }}').render(foo='FOO'))
  File ".venv/lib/python3.9/site-packages/jinja2/environment.py", line 1105, in from_string
    return cls.from_code(self, self.compile(source), gs, None)
  File ".venv/lib/python3.9/site-packages/jinja2/environment.py", line 768, in compile
    self.handle_exception(source=source_hint)
  File ".venv/lib/python3.9/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File ".venv/lib/python3.9/site-packages/jinja2/environment.py", line 760, in compile
    source = self._parse(source, name, filename)
  File ".venv/lib/python3.9/site-packages/jinja2/environment.py", line 617, in _parse
    return Parser(self, source, name, filename).parse()
  File ".venv/lib/python3.9/site-packages/jinja2/parser.py", line 1030, in parse
    result = nodes.Template(self.subparse(), lineno=1)
  File ".venv/lib/python3.9/site-packages/jinja2/parser.py", line 1005, in subparse
    self.stream.expect("variable_end")
  File ".venv/lib/python3.9/site-packages/jinja2/lexer.py", line 416, in expect
    raise TemplateSyntaxError(
jinja2.exceptions.TemplateSyntaxError: expected token 'end of print statement', got ')'
  line 1

On the other hand, directly parsing env.from_string("{{ (foo) | escape_tex }}") works and results in the same token stream, which to me indicates that something in the parsing decisions must go differently to lead to the exception.

variable_begin begin of print statement
lparen (
name foo
rparen )
pipe |
name escape_tex
variable_end end of print statement

I'm unsure if this usage of filter_stream is not supported but the gettext example also rewrites the token stream. If you don't consider this a bug I'd of couse love other suggestions to work around this.

Environment:

Python version: 3.9.7
Jinja version: 3.1.2

j7an commented 3 months ago

The following code seems to be working and not generating Traceback. It's using the latest Jinja2 v3.1.4

from jinja2 import Environment, FileSystemLoader, ext
from jinja2.ext import Extension
from jinja2.lexer import Token, TOKEN_VARIABLE_BEGIN, TOKEN_VARIABLE_END

class LatexEscapeExtension(Extension):
    def filter_stream(self, stream):
        for token in stream:
            if token.type == TOKEN_VARIABLE_BEGIN:
                yield token
                yield Token(token.lineno, 'lparen', '(')
            elif token.type == TOKEN_VARIABLE_END:
                yield Token(token.lineno, 'rparen', ')')
                yield Token(token.lineno, 'pipe', '|')
                yield Token(token.lineno, 'name', 'escape_tex')
                yield token
            else:
                yield token

# Define the LaTeX escaping filter
def escape_tex(value):
    escape_map = {
        '&': r'\&',
        '%': r'\%',
        '$': r'\$',
        '#': r'\#',
        '_': r'\_',
        '{': r'\{',
        '}': r'\}',
        '~': r'\textasciitilde{}',
        '^': r'\textasciicircum{}',
        '\\': r'\textbackslash{}',
    }
    return ''.join(escape_map.get(char, char) for char in value)

# Set up the Jinja environment and register the custom extension and filter
env = Environment(loader=FileSystemLoader('templates'), extensions=[LatexEscapeExtension])
env.filters['escape_tex'] = escape_tex

# Sample template string with LaTeX special characters
template_str = """
This is a LaTeX document with a variable: {{ variable }}
"""

# Create the template
template = env.from_string(template_str)

# Render the template with a sample value containing LaTeX special characters
output = template.render(variable="100% safe & secure $5 deal with #1 quality")

print(output)

Note that in my class above if you replace == to is like in the original code, the same expected token 'end of print statement', got ')' will be returned.

But from the documentation it reads The type of the token. This string is interned so you may compare it with arbitrary strings using the is operator.

adding @Tyl13 for further analysis.

Tyl13 commented 3 months ago

It seems that the memory address of the enums are not actually being saved when it comes to when the input is tokenized. I'm not sure where the issue is coming from as the intern should be forcing it to be saved in the same memory space. Could be an issue coming internally of Python, but I'm not sure.

pallets / jinja

injecting tokens in filter_stream fails with "expected token end of print statement" #1889