fix percent escape not working when not at the beginning of the line

cocolato commented 7 months ago

Fixes: https://github.com/sqlalchemy/mako/issues/323

The Lexer now will generate wrong result when there is no \n before %%.

Case:

from mako.template import Template

template = """%% do something
%%% do something
if <some condition>:
    %%%% do something
        """

print(Template(template).render())

The result before fix:

%% do something
%% do something
if <some condition>:
   %%%% do something

The result after fix:

% do something
%% do something
if <some condition>:
   %%% do something

cocolato commented 7 months ago

Hi, @zzzeek can you please take a look? The fix passed all the tests on my local machine。

zzzeek commented 7 months ago

this does change the output of rendering so I'm a little concerned about old templates relying on the broken behavior. I can just put it out there as 1.3.1 and see what happens

sqla-tester commented 7 months ago

New Gerrit review created for change 773b6bdaf3418436a2f3fb0a3052d833c0fedcfd: https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

cocolato commented 7 months ago

At the same time, I found that the current fix is incorrect and generates results that are missing spaces in some escape percent cases.

case：

from mako.template import Template

# 4 spaces before %
template = """
    %%%% do something
"""
print(Template(template).render())

result only has 3 spaces before "%":

   %%% do something

cocolato commented 7 months ago

The logic of the code here will consume one spcae:

https://github.com/sqlalchemy/mako/blob/dc66614a2b9ba19ee880c855471c37cd3307c22a/mako/lexer.py#L360-L362

https://github.com/sqlalchemy/mako/blob/dc66614a2b9ba19ee880c855471c37cd3307c22a/mako/lexer.py#L74-L75

cocolato commented 7 months ago

This regex block fixes the issues mentioned above and works correctly. (?<=\n)(?=[ \t]*(?=%|\#\#)) rewrite to (?<=\n)(?=[ \t]*(?=%(?!%)|\#\#))

The regex block now:

        match = self.match(
            r"""
                (.*?)         # anything, followed by:
                (
                 (?<=\n)(?=[ \t]*(?=%(?!%)|\#\#)) # an eval or line-based
                                             # comment preceded by a
                                             # consumed newline and whitespace
                 |
                 (?<!%)(?=%%+) # consume the first percent sign out of a group of percent signs
                 |
                 (?=\${)      # an expression
                 |
                 (?=</?[%&])  # a substitution or block or call start or end
                              # - don't consume
                 |
                 (\\\r?\n)    # an escaped newline  - throw away
                 |
                 \Z           # end of string
                )""",
            re.X | re.S,
        )

However, there might be room for improvement, as I am not very familiar with regular expressions.

cocolato commented 7 months ago

The test code and regex has been updated.

sqla-tester commented 7 months ago

Patchset 5d3e74223b30a2d3a996255a4df4d5f9a267a5b4 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

zzzeek commented 7 months ago

However, there might be room for improvement, as I am not very familiar with regular expressions.

you're working at expert level already so consider yourself familiar!

sqla-tester commented 7 months ago

Patchset f2820295dd5b391864c416a3d0ffc92fecc5c73b added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

zzzeek commented 7 months ago

for the formatting, use the tooling we have :

cd /path/to/mako
pre-commit install
pre-commit run --all

cocolato commented 7 months ago

Thanks, it has been updated!

sqla-tester commented 7 months ago

Patchset ab8e74756d1ddfae6584854b2765e3706ba87ad5 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

sqla-tester commented 7 months ago

Michael Bayer (zzzeek) wrote:

thank you!

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

sqla-tester commented 7 months ago

Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111 has been merged. Congratulations! :)

zzzeek commented 7 months ago

mako 1.3.1 is released. now we'll find out if anyone was relying on two percent signs rendering as two of them

zzzeek commented 7 months ago

Hi -

I've reverted the change. Can we please try again, adding new tests that test the case in #384

$ pip install mako==1.3.0
...
Successfully installed mako-1.3.0
$ mako-render - <<< foo%%bar
foo%%bar
$ pip install mako==1.3.1
...
Successfully installed mako-1.3.1
$ mako-render - <<< foo%%bar
foo%bar

The change needs to be limited to only percent signs as the first non-whitespace character, not any percent signs.

zzzeek commented 7 months ago

Mako 1.3.1 is yanked

sqla-tester commented 7 months ago

New Gerrit review created for change ab8e74756d1ddfae6584854b2765e3706ba87ad5: https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

sqla-tester commented 7 months ago

Michael Bayer (zzzeek) wrote:

please add tests for double percents in the middle of non-whitespace lines

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

cocolato commented 7 months ago

Oh sorry, I will revise and test this part again.

sqla-tester commented 7 months ago

Patchset db9309737277d46976413d90c13811a9c61c46df added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

sqla-tester commented 7 months ago

Michael Bayer (zzzeek) wrote:

wow a whole new method. OK. I dont have a lot of time today so ill try to look more closely at this soon.

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

cocolato commented 7 months ago

All has been updated . Thanks for taking the time to review !

sqla-tester commented 7 months ago

Patchset ff8ce2e128098560e3988bb2375a49fa8649d3d5 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

sqla-tester commented 7 months ago

Michael Bayer (zzzeek) wrote:

thank you again! we try again. can keep yanking/reverting til it works :)

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

sqla-tester commented 7 months ago

Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141 has been merged. Congratulations! :)

fiendish commented 7 months ago

That positive lookbehind appears to do nothing. Your tests still pass if you use just r"^(\s*)%%(%*)"

cocolato commented 7 months ago

Ahh, it seems that there is no difference between the two in MULTILINE mode, if the positive lookbehind will affect performance, maybe it should be changed here.

zzzeek commented 7 months ago

I believe the positive lookbehind does not get exercised because a pair of percent signs preceded by characters on a line would have been consumed by the "text" regexp. However, if that were not the case, then this regexp would still need to confirm the percent signs occur subsequent to the beginning of a line with only whitespace in between.

>>> import re
>>> x = "    %%  \n   %%   \n hello %%"
>>> re.compile(r"(?<=^)(\s*)%%(%*)", re.M).match(x, 24)
>>> re.compile(r"(\s*)%%(%*)", re.M).match(x, 24)
<re.Match object; span=(24, 27), match=' %%'>

will affect performance,

it would be negligible, and in fact the lookbehind version is slightly faster (note this is one million calls to the match):

>>> re1 = re.compile(r"(?<=^)(\s*)%%(%*)", re.M)
>>> re2 = re.compile(r"(\s*)%%(%*)", re.M)
>>> import timeit
>>> timeit.timeit("re1.match(x, 24)", "from __main__ import x, re1")
0.08006502594798803
>>> timeit.timeit("re2.match(x, 24)", "from __main__ import x, re2")
0.1264837539056316

the overhead of a new python call to "match_percent()" is going to be much more significant than the reg itself, and this is fine. compilation performance is not really at stake here these are all minimal changes.

maybe it should be changed here.

definitely not

sqlalchemy / mako

fix percent escape not working when not at the beginning of the line #383