Open UlrichEckhardt opened 2 years ago
Please provide an example using Jinja. It seems like you've extracted some code from Jinja, but have also mixed it with your own code, and it's hard for me to follow what part of Jinja the issue is coming from with that.
I believe that converting the string-based and rather manual path handling could be improved by using the Python pathlib. I have only gotten to use it recently, but I'm fully convinced that it's a huge step ahead in reliability and readability of code handling paths.
Thanks, I'm well aware of pathlib. I've already evaluated using it in the code you're showing. It did not help, and occasionally made the implementation more complicated.
As a quick workaround, see the above, i.e. replacing os.path.altsep with os.path.sep in the paths.
This seems like obviously a bad idea. If /
has literal meaning instead of a separator in UNC paths, then turning it into a separator can produce incorrect results.
Also note that we have to join using /
to avoid a security issue on Windows, because otherwise drive-relative and UNC paths can potentially escape the base directory. So undoing that by replacing /
with \
would reintroduce the security issue.
Hello David! Thanks for your feedback. I'm aware I was a bit vague, it was a long and warm day yesterday. I'll dig further into it today and hopefully be able to provide a full example.
Just a short update: It's not trivial to reproduce in isolation. The reason is that importlib.util.find_spec()
gives me "normal" paths ("C:\foo\bar") for modules when running Python on its own. It only gives UNC paths ("\\?\C\foo\bar") when running inside of Bazel and only for some modules. For reference, there is also a relatively new bug report for Bazel.
If I understand the intentions correctly, the Jinja template is specified using a notation similar to a POSIX path. In loaders.py, in split_template_path()
, you split this using "/" as separator, while filtering some invalid path elements. You then you use POSIX path operations to append it to the OS-specific path from the module. The result is then passed to os.path.normpath()
. The comment there reads "Use normpath to convert Windows altsep to sep.". The problem is that os.path.normpath()
doesn't touch a path that starts with "\\?\", and IMHO that's correct. The reason is that "." and ".." are valid filenames and "/" can also be part of a filename. For reference, I found this bug which discusses a similar topic.
Also note that we have to join using
/
to avoid a security issue on Windows, because otherwise drive-relative and UNC paths can potentially escape the base directory. So undoing that by replacing/
with\
would reintroduce the security issue.
Can you elaborate, perhaps give some examples? It's unfortunately not obvious to me...
If the template name e:secret.txt
was given, then ntpath.join(base, name)
(Windows) would produce e:secret.txt
, the drive relative letter is treated as an absolute path, and join
gives precedence to absolute paths. Now secret.txt
would be looked up in the current directory but on a different drive. Security issue.
I know there's a similar issue with (non-literal) UNC paths, but I can't remember an example of that one.
So instead, Jinja joins using posixpath.join
, which uses /
to join segments and doesn't recognize Windows drive-relative paths as absolute.
I see, didn't expect that semantic of joining two absolute paths. For some reason, pathlib
retained that behaviour.
Still, it can conveniently solve the task:
# template_root from the importlib spec
root = pathlib.Path(template_root)
# template from the get_source() function parameter
# important, the resolve() will eliminate "." and ".." and resolve symbolic links
template_path = (root / template).resolve()
# check if template_path is in root
if not template_path.is_relative_to(root): raise Exception()
Happy to review a PR.
Heya! :) There is https://github.com/UlrichEckhardt/jinja/releases/tag/jfrog-dev1 as a preview. I need to do some more testing due to the complicated software stack here, but it seems to have fixed the issue. I'll clean this up when I find time, probably next week rather, and file a proper PR then. I'd appreciate if you could take a look, just to say if this is heading the right direction.
I have run into this bug when trying to convert a setuptools project to a maturin (rust) project. My python code uses jinja2's PackageLoader
, which works when built with setuptools. When built with maturin, however, the spec.submodule_search_locations
contains a UNC path.
I suspect (but haven't proven) that maturin uses rust standard library code that returns UNC paths: https://github.com/rust-lang/rust/issues/42869
I've also run into this when using Buck2 which is built in Rust. As a workaround I ended up just using FileSystemLoader
and providing the template in the working directory, bypassing the need to provide a file path.
I have a case where Jinja doesn't find a template which is loaded using the
PackageLoader("gcovr")
.For reference, see https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=cmd:
Using the "\\?\" prefix is desirable in general, because it lifts a few limits, in particular one on the length of the path.
I haven't gotten to the state where I can present a simple example. My setup is a bit stacked, Jenkins running a job in Kubernetes on Windows, there using Bazel to invoke gcovr to generate a coverage report from some C/C++ binaries compiled with MinGW. What I found there was that Jinja fails to load a style.css file, which is totally there. What I ended up with was that I copied bits'n'pieces from loader.py in order to debug them on that stack in the most crude
print()
way you can imagine. It's not pretty.Still, here's the code from gcovr, which shows how Jinja is integrated:
And a bit of code that comes from
split_template_path()
andclass PackageLoader
:Locally (running Ubuntu), it works and it doesn't enter the "altsep" path of course. On Windows, it produces this output though:
I believe that converting the string-based and rather manual path handling could be improved by using the Python pathlib. I have only gotten to use it recently, but I'm fully convinced that it's a huge step ahead in reliability and readability of code handling paths. As a quick workaround, see the above, i.e. replacing
os.path.altsep
withos.path.sep
in the paths.Environment: