Open ccwienk opened 1 year ago
it could be a bug in py3.11 itself. I haven't dealt with that code in probably more than 10 years so, sure, a real reproducer, as small as possible (it really should just be a single template) is an absolute minimum to do anything here.
I created a reproducer (makobug-reproducer.tar.gz).
To use, you need (obviously) python3.11 + install packages from requirements.txt.
To run, simply execute run-me.py
For your convenience, I also built an OCI Container Image (aka Docker Image); Dockerfile is included in uploaded tar-file. So as an alternative to installing and running locally, you might run:
docker pull eu.gcr.io/gardener-project/cc/makobug-reproducer:1
docker run eu.gcr.io/gardener-project/cc/makobug-reproducer:1 cc-utils/run-me.py
I hard-coded parameters such as amount of worker-threads (line 106, max_workers) and the amount of renderings to do (line 124, range).
To my experience, this script will yield exceptions as the one I pasted above almost always; I did see occasional runs were no stacktraces were printed. Re-running at most twice always gave stacktraces again.
hi that .tar.gz is 283 source files. it's an entire application.
unfortunately I can't run an unknown application of enormous complexity and unknown origin within my own environment; instead, please supply a single, self contained mako template that reproduces the problem when compiled.
if this is not feasible, I'm sure if this issue is widespread it will soon enough be reported in other forms as python 3.11 gains adoption.
@zzzeek : would the option to run the mentioned container image be acceptable (this will prevent access to your local filesystem, unless you explicitly mount a local directory tree)?
that being said, I agree that a more stripped-down example would be preferable; since I have thus far no clue as to what actually triggers this bug, this might prove to be rather laborious.
I could imagine the bug might be possible to reproduce by crafting a template that references at least one additional template-module in which a couple of (mako) functions are defined, which in turn accept some formal parameters that are used by some <%if>
etc. + that have some nesting + that create some kiBs of output.
So I suppose instead of trying to strip-down from the template I sent, I could give it a shot and craft an "artificial" template from scratch. Will require some more time, though.
the error was added as part of https://github.com/python/cpython/issues/95185 and is in https://github.com/python/cpython/commit/86eb500068bb62b74a3865bf25d82ba0b027b943
can you run your test with pdb breakpoints and just get the actual contents of the "template_contents" variable here? that's really all we need
Traceback (most recent call last):
File "/mnt/shared_profile/src/sap/makobug-reproducer/concourse/replicator.py", line 140, in render
definition_descriptor = self._render(definition_descriptor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/shared_profile/src/sap/makobug-reproducer/concourse/replicator.py", line 211, in _render
t = mako.template.Template(template_contents, lookup=self.lookup)
or print statements, or dump to a file, etc...
the error also indicates invalid Python in the first place...
The best option is probably to try a custom version of mako that has
try:
t = mako.template.Template(template_contents, lookup=self.lookup)
except SyntaxException:
print(template_contents)
raise
@zzzeek maybe doing something like this, probably logging to a logger, may make sense regardless of this issue though
Mako already does this, poster could also turn this on, see https://docs.makotemplates.org/en/latest/usage.html#handling-exceptions
@zzzeek : sorry for replying thus late. I dumped the contents of template_contents
into a (gzipped) file
I created this using (as suggested) the following code-block:
try:
t = mako.template.Template(template_contents, lookup=self.lookup)
except:
with open('/src/sap/makobug-reproducer/template_contents.dump', 'w') as f:
f.write(template_contents)
it is probably worth mentioning that using a bare except was the only way I could actually handle this exception. Neither of SystemError
, mako.exceptions.SyntaxException
, SyntaxError
worked.
hi and thanks for this.
Unfortunately no error is reproduced, I can load this template under Python 3.11.3 and compile it without issue.
what happens if you use the given template in a program like this (assume you put the content into foo.mako)?
from mako.template import Template
t = Template(filename="foo.mako")
print(t.code)
what's the exact Python 3.11 version you are using?
Unfortunately no error is reproduced, I can load this template under Python 3.11.3 and compile it without issue.
did you try this w/ multithreading (at least two threads) and multiple executions? As I explained initially, this error occurs sporadically, and only (according to my observations) when concurrency is involved.
If using e.g. four threads, the issue occurs almost always at least once if doing the template-instantiation ~256 times
As stated above, I can reproduce this error in the following environments:
If running this just a couple of times, or running it hundreds / thousands of times, but single-threaded, this error never occurs. However, it occurs quite frequently if doing multithreading (but only as of python3.11).
I will try whether I can reproduce the error using the approach you shared and will write another update to this issue.
you can try adding threads to the POC but I can't see any way that threads would have an effect here, the Mako compilation process works on a local datastructure that is not accessible to other functions. I had assumed the error was sporadic only because this particular template was not getting compiled every time.
threads:
from mako.template import Template
import threading
import time
def go():
while True:
t = Template(filename="foo.mako")
time.sleep(.0001)
threads = [threading.Thread(target=go) for i in range(50)]
for t in threads:
t.start()
for t in threads:
t.join()
it's also not clear why, if you are using a file-based TemplateLookup, that this template would be getting recompiled at all. Mako writes a module file to disk and reuses that.
@zzzeek : originally, it was planned to have a multitude of templates. considering that we actually have just one (and might do some caching anyhow), I might change this and cache the template.
In the reproducer I uploaded the multithreading is done in the run-me.py
script
@zzzeek : I did some more investigations based on your feedback.
Firstly, I changed my code to cache the value of template_contents
(using a global / module variable) - this did not make any notable difference.
Next, I shortcut the "rendering" and only created the mako.Template
instance. This reproducibly did not yield an exception (I tried some thousand executions)
Then I changed the code back to also do the rendering, but placed two prints (after template-creation, and rendering respectively). It turned out that I saw stacktraces indicating SyntaxExceptions happened both during Template-creation, and -rendering (with the latter being more frequent).
After caching the template, with little surprise, the stacktraces that showed were limited to the invocation of the rendering.
It is still somewhat surprising that (if not caching the template instance) exceptions will occasionally also be raised during template-instantiation; so there do seem to be some race-conditions affecting different codepaths from (I assume) mako.
Mako writes a module file to disk and reuses that.
@zzzeek : would you be so kind as to give me a hint to the code doing that? That might be a very good explanation for the race-condition I assume. Although not a good explanation of why this seems to only affect python3.11, admittedly.
Update: I skipped a bit through Mako's code, and found an option cache_enabled
in mako.Template
(I suppose I could have found that in the documentation as well). However, disabling did not make any difference. When accessing the cache
member of the created template, however, I received an error indicating that beaker
(apparently the default cache implementation) was not installed (which I could fix by installing beaker); at any rate: disabling or enabling caching does not make any difference considering the occurrence of the described exceptions (+ I assume caching is implicitly disabled if beaker is not installed anyhow?).
hi -
You use a TemplateLookup and give it a file path to store modules, and then use the TemplateLookup to retrieve .mako files from the filesystem as compiled templates.
This works best when you have .mako files that you are loading and rendering in your application.
the second example at https://docs.makotemplates.org/en/latest/usage.html#using-templatelookup illustrates how to configure TemplateLookup with a file path.
The module caching thing is not readily available for on-the-fly templates, what you could do for on-the-fly templates is write them out to .mako files, then use TemplateLookup to access them.
Otherwise for local in-memory Template objects, Mako does not make use of global state when compiling, although there is a global set of compiled template modules (after the compilation is done) that are indirectly linked to template URLs or in-memory identifiers; I can see here there is a potential for key conflicts if you have anonymously-identified Template objects but this map isn't used when compilation proceeds.
@zzzeek : I would like to thank you for your very valuable hint w.r.t. template-caching.
I did some trial-and-error + thorough stacktrace-reading, which lead me to a working fix / workaround.
To give you some more details:
During rendering of the "root" template, additional sub-templates are loaded (instantiating again mako.Template). Those sub-templates are rendered from root-template.
It would appear that there seems to be a race-condition if doing exactly this (creating templates during the rendering of a parent-template). By adding some caching here, I could successfully run some few thousand rendering runs w/o those exceptions occurring.
This is the codeblock in question where adding the @functools.cache
decorators (were not there before) - almost - fixed the issue I described (as in: occurs a lot less frequently):
def step_template(name):
step_file = ci.util.existing_file(os.path.join(steps_dir, name + '.mako'))
return mako.template.Template(filename=step_file)
@functools.cache
def step_def(name):
template = step_template(name)
return template.get_def(name + '_step').render
@functools.cache
def step_lib_def(name):
template = step_template(name)
return template.get_def(name + '_step_lib').render
from there, I added a lock (threading.Lock) to mako.template and put the whole code of mako.template.Temple's __init__
method under it, like so:
# mako/template.py
import threading
lock = threading.Lock()
class Template:
def __init__(......):
with lock:
# all the code from original mako code
With this change, the exception no longer occurs (I tried about 5000 rendering split to about 50 executions), even if I remove caching from above.
I suppose by doing some further analysis of the code from mako.template.Template, it should be possible to further narrow down the actual root-cause. I can create a workaround for my usecase by doing some monkey-patching, of course.
The only thing I can see that could conceivably be some kind of "global" would be when we use the compile() Python builtin we pass a module identifier to it, and for an anonymous Template like you have, that identifer will be hash(id(template))
, so there could be re-use of the same id with different template contents. That would be very unusual if compile() somehow held onto state from a previous call.
There are many ways to fix your code here. One is to put the lock in your code:
template_mutex = threading.Lock()
def step_template(name):
step_file = ci.util.existing_file(os.path.join(steps_dir, name + '.mako'))
with template_mutex:
return mako.template.Template(filename=step_file)
Another, better and much more idiomatic way is to use TemplateLookup
as mentioned, since these are file based templates:
lookup = TemplateLookup(directories=[steps_dir])
def step_template(name):
step_file = ci.util.existing_file(os.path.join(steps_dir, name + '.mako'))
return lookup.get_template(name + ".mako")
TemplateLookup uses a mutex for its compilation, so that would eliminate the problem. Then, you will get a lot less compile calls if you give your lookup a module directory:
lookup = TemplateLookup(directories=[steps_dir], module_directory='/tmp/mako_modules')
def step_template(name):
step_file = ci.util.existing_file(os.path.join(steps_dir, name + '.mako'))
return lookup.get_template(name + ".mako")
your program will put .py files into /tmp/mako_modules that get reused.
I could further narrow the issue down. If I add just lock Template._compile_text
and Template._compile_from_file
from parallel execution, the issue also does not appear.
300 # if plain text, compile code in memory only
301 if text is not None:
302 with lock:
303 (code, module) = _compile_text(self, text, filename)
304 self._code = code
305 self._source = text
306 ModuleInfo(module, None, self, filename, code, text, uri)
307 elif filename is not None:
308 # if template filename and a module directory, load
309 # a filesystem-based module file, generating if needed
310 if module_filename is not None:
311 path = module_filename
312 print(f'{module_filename=}')
313 elif module_directory is not None:
314 print(f'{module_directory=}')
315 path = os.path.abspath(
316 os.path.join(
317 os.path.normpath(module_directory), u_norm + ".py"
318 )
319 )
320 else:
321 path = None
322 with lock:
323 module = self._compile_from_file(path, filename)
324 else:
325 raise exceptions.RuntimeException(
326 "Template requires text or filename"
327 )
@zzzeek : switching to TemplateLookup sounds like a good idea, too. However, I still think there is a bug in template.Template
. Instantiating multiple instances of a class and calling their methods should not run into race-conditions I think
_compile_from_file() calls _compile_text() so that code would deadlock if _compile_from_file() does not have a path (Edit: I misread the code, the lock is outside of _compile_text() itself)
well I just successfully executed it w/o issues :-)
@zzzeek : switching to TemplateLookup sounds like a good idea, too. However, I still think there is a bug in
template.Template
. Instantiating multiple instances of a class and calling their methods should not run into race-conditions I think
At this point you should have enough information to create a single short script that demonstrates the problem, take my script at https://github.com/sqlalchemy/mako/issues/378#issuecomment-1600811371 and adjust
well I just successfully executed it w/o issues :-)
which means it's being called with a path, which seems to indicate there are other calls to Template against the same file with different arguments
as far as I understand the code, _compile_from_file calls _compile_module_file, so no deadlock :-)
yeah, I think this should be feasible
My code calls Template at two locations. one time using a fpath, one time passing a string.
I also changed it to always pass a string (in this case, the race-condition still occurs - so I think there is a race-condition involved in the "_compile" method
I'm not sure if I'm ready to globally serialize all compilation to Template(). Would rather detect what you're doing and raise an exception
technically, you sometimes do raise an exception already ;-)
as far as I understand the code, _compile_from_file calls _compile_module_file, so no deadlock :-)
take a look. there's a conditional, so it can go either way
in my case, path is always None (I checked this by adding a print..)
I am not saying adding a lock is a good idea for a fix. this is how far I came in finding the root-cause
OK in your code you are locking outside _compile_text() , so that's why that's OK
I started (after observing that adding some caching will reduce likelihood of the error) to add lock to full __init__
, then started to reduce the lines of code I had to lock and still not get an error
anyhow: using this knowledge, I can certainly fix my code. Still, I think this is a bug in mako - albeit one that might not affect many users besides me
strange though, that this only seems to occur as of python3.11 🤔
it may very well be a bug in py3.11 itself. since you can reproduce, work to iteratively reduce your program one step at a time, ensuring reproduction each time, down to a script that looks like this one
@zzzeek : agreed. I will try that. at any rate, thanks a lot again, as mentioned before, for giving me a great into into the right direction :-)
@zzzeek : interestingly, switching to mako.lookup.TemplateLookup as you suggested did seem to fix the issue (after removing the caching I added earlier, and of course, after removing again the lock I added to mako's code).
It will still be an interesting task to add a reproducer for sure.
I have also been encountering the same issue in my app. It also occurs intermittently and only with Python 3.11.
I am already using TemplateLookup
.
I think I solved my issue. I was initializing a new TemplateLookup
every single time I served a template. Re-factoring my code so that I only initialize it once seems to have fixed it.
OK but we want to figure out why concurrent calls to compile() is causing this problem (And also why my test script above does not seem to have this problem)
I have been trying to reproduce the error with a small test script but have been unsuccessful.
I thus far also did not succeed in creating a minified reproducer. Will update the issue once I do find some more time.
I am also seeing this occasionally pop up in a production application since updating to python 3.11. It's very rare, relative to the number of template renders.
We are using a customised TemplateLookup that inherits from the mako TemplateLookup. Thanks for all the information provided on this issue - I think that will really help narrow this down for us.
Just chiming in that I've also seen this behavior sporadically with Python 3.11
It occurs when using TemplateLookup.get_template. It's happening very rarely, I'd say about once a week in a nightly job that calls hundreds of renders via an API that uses mako.
Each API call handles a single render, the lookup looks roughly like:
lookup = TemplateLookup(directories=[templates_path])
try:
template = lookup.get_template(specific_template)
except TemplateLookupException:
template = lookup.get_template(common_template)
return template
Exception observed is:
API Exception: SyntaxException('(SystemError) AST constructor recursion depth mismatch (before=93, after=85) (\'if <condition>:pass\') in file <file_path>
Each lookup path is also unique to each render, so no two API workers would be hitting the same templates_path
directory. We check the templates out to a per-call tmp directory.
This seems more an issue with python: this pytest issue reports the same problem without mako involvement https://github.com/pytest-dev/pytest/issues/10874
wow howd you find that?
We use mako for rendering CICD Pipeline Definitions (YAML files). It is a rather complex template w/ a lot of Mako-Function-Definitions and nesting + multithreading. The resulting documents vary in size (depending on input-parameters), and are typically between ~300 kiB and ~1.2 MiB.
The template was used w/ Python-versions 3.6 .. 3.10. When upgrading to 3.11, we saw (and still see) sporadic SyntaxExceptions, which roughly occur in about 5% of the time (w/ unchanged template-parameters, of course!). I started working on a minimalised reproducer. If instantiating the same template w/ same parameters 64 times using 2 threads, I almost always see at least one exception stacktrace. The incriminated lines vary, whereas the Mako-part of the stacktrace seems to be always the same.
The error does not seem to occur when limiting concurrency to just one thread. Thus, I suspect a race-condition, probably within Mako's codebase.
The error occurs for latest versions of python3 alpine (3.11.3-r11) when running inside a virtualisation container, and archlinux (3.11.3-1) when running natively.
Example Stacktrace
It is probably worth mentioning that by decreasing the template's output size, the likelihood of this error seems to become smaller.
I could share a copy of my somewhat slimmed-down reproducer; it still contains most of the code from the repository I referenced above, if this is considered helpful.