sqlalchemy / mako

Mako Templates for Python
https://www.makotemplates.org
MIT License
353 stars 60 forks source link

Sporadic SyntaxException w/ Python3.11 #378

Open ccwienk opened 1 year ago

ccwienk commented 1 year ago

We use mako for rendering CICD Pipeline Definitions (YAML files). It is a rather complex template w/ a lot of Mako-Function-Definitions and nesting + multithreading. The resulting documents vary in size (depending on input-parameters), and are typically between ~300 kiB and ~1.2 MiB.

The template was used w/ Python-versions 3.6 .. 3.10. When upgrading to 3.11, we saw (and still see) sporadic SyntaxExceptions, which roughly occur in about 5% of the time (w/ unchanged template-parameters, of course!). I started working on a minimalised reproducer. If instantiating the same template w/ same parameters 64 times using 2 threads, I almost always see at least one exception stacktrace. The incriminated lines vary, whereas the Mako-part of the stacktrace seems to be always the same.

The error does not seem to occur when limiting concurrency to just one thread. Thus, I suspect a race-condition, probably within Mako's codebase.

The error occurs for latest versions of python3 alpine (3.11.3-r11) when running inside a virtualisation container, and archlinux (3.11.3-1) when running natively.

Example Stacktrace

Traceback (most recent call last):
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/pyparser.py", line 36, in parse
    return _ast_util.parse(code, "<unknown>", mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/_ast_util.py", line 91, in parse
    return compile(expr, filename, mode, PyCF_ONLY_AST)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SystemError: AST constructor recursion depth mismatch (before=63, after=65)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/shared_profile/src/sap/makobug-reproducer/concourse/replicator.py", line 140, in render
    definition_descriptor = self._render(definition_descriptor)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/shared_profile/src/sap/makobug-reproducer/concourse/replicator.py", line 211, in _render
    t = mako.template.Template(template_contents, lookup=self.lookup)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/template.py", line 300, in __init__
    (code, module) = _compile_text(self, text, filename)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/template.py", line 677, in _compile_text
    source, lexer = _compile(
                    ^^^^^^^^^
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/template.py", line 657, in _compile
    node = lexer.parse()
           ^^^^^^^^^^^^^
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/lexer.py", line 248, in parse
    if self.match_python_block():
       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/lexer.py", line 392, in match_python_block
    self.append_node(
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/lexer.py", line 129, in append_node
    node = nodecls(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/parsetree.py", line 158, in __init__
    self.code = ast.PythonCode(text, **self.exception_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/ast.py", line 42, in __init__
    expr = pyparser.parse(code.lstrip(), "exec", **exception_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redacted/.local/lib/python3.11/site-packages/mako/pyparser.py", line 38, in parse
    raise exceptions.SyntaxException(
mako.exceptions.SyntaxException: (SystemError) AST constructor recursion depth mismatch (before=63, after=65) ('import os\n\nimport oci.auth as oa\nimport model.cont') at line: 2 char: 1

It is probably worth mentioning that by decreasing the template's output size, the likelihood of this error seems to become smaller.

I could share a copy of my somewhat slimmed-down reproducer; it still contains most of the code from the repository I referenced above, if this is considered helpful.

zzzeek commented 1 year ago

it could be a bug in py3.11 itself. I haven't dealt with that code in probably more than 10 years so, sure, a real reproducer, as small as possible (it really should just be a single template) is an absolute minimum to do anything here.

ccwienk commented 1 year ago

I created a reproducer (makobug-reproducer.tar.gz).

To use, you need (obviously) python3.11 + install packages from requirements.txt. To run, simply execute run-me.py

For your convenience, I also built an OCI Container Image (aka Docker Image); Dockerfile is included in uploaded tar-file. So as an alternative to installing and running locally, you might run:

I hard-coded parameters such as amount of worker-threads (line 106, max_workers) and the amount of renderings to do (line 124, range).

To my experience, this script will yield exceptions as the one I pasted above almost always; I did see occasional runs were no stacktraces were printed. Re-running at most twice always gave stacktraces again.

zzzeek commented 1 year ago

hi that .tar.gz is 283 source files. it's an entire application.

unfortunately I can't run an unknown application of enormous complexity and unknown origin within my own environment; instead, please supply a single, self contained mako template that reproduces the problem when compiled.

if this is not feasible, I'm sure if this issue is widespread it will soon enough be reported in other forms as python 3.11 gains adoption.

ccwienk commented 1 year ago

@zzzeek : would the option to run the mentioned container image be acceptable (this will prevent access to your local filesystem, unless you explicitly mount a local directory tree)?

that being said, I agree that a more stripped-down example would be preferable; since I have thus far no clue as to what actually triggers this bug, this might prove to be rather laborious.

I could imagine the bug might be possible to reproduce by crafting a template that references at least one additional template-module in which a couple of (mako) functions are defined, which in turn accept some formal parameters that are used by some <%if> etc. + that have some nesting + that create some kiBs of output.

So I suppose instead of trying to strip-down from the template I sent, I could give it a shot and craft an "artificial" template from scratch. Will require some more time, though.

zzzeek commented 1 year ago

the error was added as part of https://github.com/python/cpython/issues/95185 and is in https://github.com/python/cpython/commit/86eb500068bb62b74a3865bf25d82ba0b027b943

zzzeek commented 1 year ago

can you run your test with pdb breakpoints and just get the actual contents of the "template_contents" variable here? that's really all we need

Traceback (most recent call last):
  File "/mnt/shared_profile/src/sap/makobug-reproducer/concourse/replicator.py", line 140, in render
    definition_descriptor = self._render(definition_descriptor)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/shared_profile/src/sap/makobug-reproducer/concourse/replicator.py", line 211, in _render
    t = mako.template.Template(template_contents, lookup=self.lookup)
zzzeek commented 1 year ago

or print statements, or dump to a file, etc...

zzzeek commented 1 year ago

the error also indicates invalid Python in the first place...

CaselIT commented 1 year ago

The best option is probably to try a custom version of mako that has

try:
  t = mako.template.Template(template_contents, lookup=self.lookup)
except SyntaxException:
  print(template_contents)
  raise

@zzzeek maybe doing something like this, probably logging to a logger, may make sense regardless of this issue though

zzzeek commented 1 year ago

Mako already does this, poster could also turn this on, see https://docs.makotemplates.org/en/latest/usage.html#handling-exceptions

ccwienk commented 1 year ago

@zzzeek : sorry for replying thus late. I dumped the contents of template_contents into a (gzipped) file

I created this using (as suggested) the following code-block:

 try:
     t = mako.template.Template(template_contents, lookup=self.lookup)
 except:
     with open('/src/sap/makobug-reproducer/template_contents.dump', 'w') as f:
         f.write(template_contents)

it is probably worth mentioning that using a bare except was the only way I could actually handle this exception. Neither of SystemError, mako.exceptions.SyntaxException, SyntaxError worked.

zzzeek commented 1 year ago

hi and thanks for this.

Unfortunately no error is reproduced, I can load this template under Python 3.11.3 and compile it without issue.

what happens if you use the given template in a program like this (assume you put the content into foo.mako)?


from mako.template import Template

t = Template(filename="foo.mako")
print(t.code)

what's the exact Python 3.11 version you are using?

ccwienk commented 1 year ago

Unfortunately no error is reproduced, I can load this template under Python 3.11.3 and compile it without issue.

did you try this w/ multithreading (at least two threads) and multiple executions? As I explained initially, this error occurs sporadically, and only (according to my observations) when concurrency is involved.

If using e.g. four threads, the issue occurs almost always at least once if doing the template-instantiation ~256 times

As stated above, I can reproduce this error in the following environments:

If running this just a couple of times, or running it hundreds / thousands of times, but single-threaded, this error never occurs. However, it occurs quite frequently if doing multithreading (but only as of python3.11).

I will try whether I can reproduce the error using the approach you shared and will write another update to this issue.

zzzeek commented 1 year ago

you can try adding threads to the POC but I can't see any way that threads would have an effect here, the Mako compilation process works on a local datastructure that is not accessible to other functions. I had assumed the error was sporadic only because this particular template was not getting compiled every time.

zzzeek commented 1 year ago

threads:


from mako.template import Template
import threading
import time

def go():
    while True:
        t = Template(filename="foo.mako")
        time.sleep(.0001)

threads = [threading.Thread(target=go) for i in range(50)]

for t in threads:
    t.start()

for t in threads:
    t.join()
zzzeek commented 1 year ago

it's also not clear why, if you are using a file-based TemplateLookup, that this template would be getting recompiled at all. Mako writes a module file to disk and reuses that.

ccwienk commented 1 year ago

@zzzeek : originally, it was planned to have a multitude of templates. considering that we actually have just one (and might do some caching anyhow), I might change this and cache the template.

In the reproducer I uploaded the multithreading is done in the run-me.py script

ccwienk commented 1 year ago

@zzzeek : I did some more investigations based on your feedback.

Firstly, I changed my code to cache the value of template_contents (using a global / module variable) - this did not make any notable difference.

Next, I shortcut the "rendering" and only created the mako.Template instance. This reproducibly did not yield an exception (I tried some thousand executions)

Then I changed the code back to also do the rendering, but placed two prints (after template-creation, and rendering respectively). It turned out that I saw stacktraces indicating SyntaxExceptions happened both during Template-creation, and -rendering (with the latter being more frequent).

After caching the template, with little surprise, the stacktraces that showed were limited to the invocation of the rendering.

It is still somewhat surprising that (if not caching the template instance) exceptions will occasionally also be raised during template-instantiation; so there do seem to be some race-conditions affecting different codepaths from (I assume) mako.

ccwienk commented 1 year ago

Mako writes a module file to disk and reuses that.

@zzzeek : would you be so kind as to give me a hint to the code doing that? That might be a very good explanation for the race-condition I assume. Although not a good explanation of why this seems to only affect python3.11, admittedly.

Update: I skipped a bit through Mako's code, and found an option cache_enabled in mako.Template (I suppose I could have found that in the documentation as well). However, disabling did not make any difference. When accessing the cache member of the created template, however, I received an error indicating that beaker (apparently the default cache implementation) was not installed (which I could fix by installing beaker); at any rate: disabling or enabling caching does not make any difference considering the occurrence of the described exceptions (+ I assume caching is implicitly disabled if beaker is not installed anyhow?).

zzzeek commented 1 year ago

hi -

You use a TemplateLookup and give it a file path to store modules, and then use the TemplateLookup to retrieve .mako files from the filesystem as compiled templates.

This works best when you have .mako files that you are loading and rendering in your application.

the second example at https://docs.makotemplates.org/en/latest/usage.html#using-templatelookup illustrates how to configure TemplateLookup with a file path.

The module caching thing is not readily available for on-the-fly templates, what you could do for on-the-fly templates is write them out to .mako files, then use TemplateLookup to access them.

Otherwise for local in-memory Template objects, Mako does not make use of global state when compiling, although there is a global set of compiled template modules (after the compilation is done) that are indirectly linked to template URLs or in-memory identifiers; I can see here there is a potential for key conflicts if you have anonymously-identified Template objects but this map isn't used when compilation proceeds.

ccwienk commented 1 year ago

@zzzeek : I would like to thank you for your very valuable hint w.r.t. template-caching.

I did some trial-and-error + thorough stacktrace-reading, which lead me to a working fix / workaround.

To give you some more details:

During rendering of the "root" template, additional sub-templates are loaded (instantiating again mako.Template). Those sub-templates are rendered from root-template.

It would appear that there seems to be a race-condition if doing exactly this (creating templates during the rendering of a parent-template). By adding some caching here, I could successfully run some few thousand rendering runs w/o those exceptions occurring.

This is the codeblock in question where adding the @functools.cache decorators (were not there before) - almost - fixed the issue I described (as in: occurs a lot less frequently):

 def step_template(name):
     step_file = ci.util.existing_file(os.path.join(steps_dir, name + '.mako'))

     return mako.template.Template(filename=step_file)

 @functools.cache
 def step_def(name):
     template = step_template(name)

     return template.get_def(name + '_step').render

 @functools.cache                                                              
 def step_lib_def(name):
     template = step_template(name)

     return template.get_def(name + '_step_lib').render

from there, I added a lock (threading.Lock) to mako.template and put the whole code of mako.template.Temple's __init__ method under it, like so:

# mako/template.py

import threading
lock = threading.Lock()

class Template:
 def __init__(......):
  with lock:
   # all the code from original mako code

With this change, the exception no longer occurs (I tried about 5000 rendering split to about 50 executions), even if I remove caching from above.

I suppose by doing some further analysis of the code from mako.template.Template, it should be possible to further narrow down the actual root-cause. I can create a workaround for my usecase by doing some monkey-patching, of course.

zzzeek commented 1 year ago

The only thing I can see that could conceivably be some kind of "global" would be when we use the compile() Python builtin we pass a module identifier to it, and for an anonymous Template like you have, that identifer will be hash(id(template)), so there could be re-use of the same id with different template contents. That would be very unusual if compile() somehow held onto state from a previous call.

There are many ways to fix your code here. One is to put the lock in your code:

template_mutex = threading.Lock()

def step_template(name):
     step_file = ci.util.existing_file(os.path.join(steps_dir, name + '.mako'))

     with template_mutex:
         return mako.template.Template(filename=step_file)

Another, better and much more idiomatic way is to use TemplateLookup as mentioned, since these are file based templates:

lookup = TemplateLookup(directories=[steps_dir])

def step_template(name):
     step_file = ci.util.existing_file(os.path.join(steps_dir, name + '.mako'))

     return lookup.get_template(name + ".mako")

TemplateLookup uses a mutex for its compilation, so that would eliminate the problem. Then, you will get a lot less compile calls if you give your lookup a module directory:

lookup = TemplateLookup(directories=[steps_dir], module_directory='/tmp/mako_modules')

def step_template(name):
     step_file = ci.util.existing_file(os.path.join(steps_dir, name + '.mako'))

     return lookup.get_template(name + ".mako")

your program will put .py files into /tmp/mako_modules that get reused.

ccwienk commented 1 year ago

I could further narrow the issue down. If I add just lock Template._compile_text and Template._compile_from_file from parallel execution, the issue also does not appear.

300         # if plain text, compile code in memory only                       
301         if text is not None:
302             with lock:
303                 (code, module) = _compile_text(self, text, filename)
304             self._code = code
305             self._source = text
306             ModuleInfo(module, None, self, filename, code, text, uri)
307         elif filename is not None:
308             # if template filename and a module directory, load
309             # a filesystem-based module file, generating if needed
310             if module_filename is not None:
311                 path = module_filename
312                 print(f'{module_filename=}')
313             elif module_directory is not None:
314                 print(f'{module_directory=}')
315                 path = os.path.abspath(
316                     os.path.join(
317                         os.path.normpath(module_directory), u_norm + ".py"
318                     )
319                 )
320             else:
321                 path = None
322             with lock: 
323                 module = self._compile_from_file(path, filename)
324         else:
325             raise exceptions.RuntimeException(
326                 "Template requires text or filename"
327             )
ccwienk commented 1 year ago

@zzzeek : switching to TemplateLookup sounds like a good idea, too. However, I still think there is a bug in template.Template. Instantiating multiple instances of a class and calling their methods should not run into race-conditions I think

zzzeek commented 1 year ago

_compile_from_file() calls _compile_text() so that code would deadlock if _compile_from_file() does not have a path (Edit: I misread the code, the lock is outside of _compile_text() itself)

ccwienk commented 1 year ago

well I just successfully executed it w/o issues :-)

zzzeek commented 1 year ago

@zzzeek : switching to TemplateLookup sounds like a good idea, too. However, I still think there is a bug in template.Template. Instantiating multiple instances of a class and calling their methods should not run into race-conditions I think

At this point you should have enough information to create a single short script that demonstrates the problem, take my script at https://github.com/sqlalchemy/mako/issues/378#issuecomment-1600811371 and adjust

zzzeek commented 1 year ago

well I just successfully executed it w/o issues :-)

which means it's being called with a path, which seems to indicate there are other calls to Template against the same file with different arguments

ccwienk commented 1 year ago

as far as I understand the code, _compile_from_file calls _compile_module_file, so no deadlock :-)

yeah, I think this should be feasible

My code calls Template at two locations. one time using a fpath, one time passing a string.

I also changed it to always pass a string (in this case, the race-condition still occurs - so I think there is a race-condition involved in the "_compile" method

zzzeek commented 1 year ago

I'm not sure if I'm ready to globally serialize all compilation to Template(). Would rather detect what you're doing and raise an exception

ccwienk commented 1 year ago

technically, you sometimes do raise an exception already ;-)

zzzeek commented 1 year ago

as far as I understand the code, _compile_from_file calls _compile_module_file, so no deadlock :-)

take a look. there's a conditional, so it can go either way

ccwienk commented 1 year ago

in my case, path is always None (I checked this by adding a print..)

ccwienk commented 1 year ago

I am not saying adding a lock is a good idea for a fix. this is how far I came in finding the root-cause

zzzeek commented 1 year ago

OK in your code you are locking outside _compile_text() , so that's why that's OK

ccwienk commented 1 year ago

I started (after observing that adding some caching will reduce likelihood of the error) to add lock to full __init__, then started to reduce the lines of code I had to lock and still not get an error

ccwienk commented 1 year ago

anyhow: using this knowledge, I can certainly fix my code. Still, I think this is a bug in mako - albeit one that might not affect many users besides me

ccwienk commented 1 year ago

strange though, that this only seems to occur as of python3.11 🤔

zzzeek commented 1 year ago

it may very well be a bug in py3.11 itself. since you can reproduce, work to iteratively reduce your program one step at a time, ensuring reproduction each time, down to a script that looks like this one

ccwienk commented 1 year ago

@zzzeek : agreed. I will try that. at any rate, thanks a lot again, as mentioned before, for giving me a great into into the right direction :-)

ccwienk commented 1 year ago

@zzzeek : interestingly, switching to mako.lookup.TemplateLookup as you suggested did seem to fix the issue (after removing the caching I added earlier, and of course, after removing again the lock I added to mako's code).

It will still be an interesting task to add a reproducer for sure.

JonnyWong16 commented 1 year ago

I have also been encountering the same issue in my app. It also occurs intermittently and only with Python 3.11.

https://www.reddit.com/r/Tautulli/comments/1042t13/error_syntaxexception_systemerror_ast_constructor/

I am already using TemplateLookup.

https://github.com/Tautulli/Tautulli/blob/ea6c6078df410f333a060016dfce18c21ad134c9/plexpy/webserve.py#L126

JonnyWong16 commented 1 year ago

I think I solved my issue. I was initializing a new TemplateLookup every single time I served a template. Re-factoring my code so that I only initialize it once seems to have fixed it.

zzzeek commented 1 year ago

OK but we want to figure out why concurrent calls to compile() is causing this problem (And also why my test script above does not seem to have this problem)

JonnyWong16 commented 1 year ago

I have been trying to reproduce the error with a small test script but have been unsuccessful.

ccwienk commented 1 year ago

I thus far also did not succeed in creating a minified reproducer. Will update the issue once I do find some more time.

Mark-Hetherington commented 1 year ago

I am also seeing this occasionally pop up in a production application since updating to python 3.11. It's very rare, relative to the number of template renders.

We are using a customised TemplateLookup that inherits from the mako TemplateLookup. Thanks for all the information provided on this issue - I think that will really help narrow this down for us.

zsblevins commented 1 year ago

Just chiming in that I've also seen this behavior sporadically with Python 3.11

It occurs when using TemplateLookup.get_template. It's happening very rarely, I'd say about once a week in a nightly job that calls hundreds of renders via an API that uses mako.

Each API call handles a single render, the lookup looks roughly like:

lookup = TemplateLookup(directories=[templates_path])
try:
    template = lookup.get_template(specific_template)
except TemplateLookupException:
    template = lookup.get_template(common_template)
return template

Exception observed is:

API Exception: SyntaxException('(SystemError) AST constructor recursion depth mismatch (before=93, after=85) (\'if <condition>:pass\') in file <file_path>

Each lookup path is also unique to each render, so no two API workers would be hitting the same templates_path directory. We check the templates out to a per-call tmp directory.

CaselIT commented 1 year ago

This seems more an issue with python: this pytest issue reports the same problem without mako involvement https://github.com/pytest-dev/pytest/issues/10874

zzzeek commented 1 year ago

wow howd you find that?