pallets / jinja

A very fast and expressive template engine.
https://jinja.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
10.42k stars 1.62k forks source link

Faulty handling of UNC paths #1675

Open UlrichEckhardt opened 2 years ago

UlrichEckhardt commented 2 years ago

I have a case where Jinja doesn't find a template which is loaded using the PackageLoader("gcovr").

For reference, see https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=cmd:

File I/O functions in the Windows API convert "/" to "\" as part of converting the name to an NT-style name, except when using the "\\?\" prefix as detailed in the following sections.

Using the "\\?\" prefix is desirable in general, because it lifts a few limits, in particular one on the length of the path.

I haven't gotten to the state where I can present a simple example. My setup is a bit stacked, Jenkins running a job in Kubernetes on Windows, there using Bazel to invoke gcovr to generate a coverage report from some C/C++ binaries compiled with MinGW. What I found there was that Jinja fails to load a style.css file, which is totally there. What I ended up with was that I copied bits'n'pieces from loader.py in order to debug them on that stack in the most crude print() way you can imagine. It's not pretty.

Still, here's the code from gcovr, which shows how Jinja is integrated:

    from jinja2 import Environment, PackageLoader

    return Environment(
        loader=PackageLoader("gcovr"),
        autoescape=True,
        trim_blocks=True,
        lstrip_blocks=True,
    )

And a bit of code that comes from split_template_path() and class PackageLoader:

    template = 'style.css'
    p = os.path.normpath(
        posixpath.join(template_root, *split_template_path(template))
    )
    print(p)
    if not os.path.exists(p):
        print('  -> does not exist')
    elif not os.path.isfile(p):
        print('  -> is not a file')
    else:
        print('  -> is a file')

    if os.path.altsep and os.path.altsep in p:
        p = p.replace(os.path.altsep, os.path.sep)
        print(p)
        if not os.path.exists(p):
            print('  -> does not exist')
        elif not os.path.isfile(p):
            print('  -> is not a file')
        else:
            print('  -> is a file')

Locally (running Ubuntu), it works and it doesn't enter the "altsep" path of course. On Windows, it produces this output though:

\\?\C:\Users\ContainerAdministrator\AppData\Local\Temp\Bazel.runfiles_u_shwwy7\runfiles\gcovr_wheel_5_1_default\gcovr\templates/style.css
  -> does not exist
\\?\C:\Users\ContainerAdministrator\AppData\Local\Temp\Bazel.runfiles_u_shwwy7\runfiles\gcovr_wheel_5_1_default\gcovr\templates\style.css
  -> is a file

I believe that converting the string-based and rather manual path handling could be improved by using the Python pathlib. I have only gotten to use it recently, but I'm fully convinced that it's a huge step ahead in reliability and readability of code handling paths. As a quick workaround, see the above, i.e. replacing os.path.altsep with os.path.sep in the paths.

Environment:

davidism commented 2 years ago

Please provide an example using Jinja. It seems like you've extracted some code from Jinja, but have also mixed it with your own code, and it's hard for me to follow what part of Jinja the issue is coming from with that.

I believe that converting the string-based and rather manual path handling could be improved by using the Python pathlib. I have only gotten to use it recently, but I'm fully convinced that it's a huge step ahead in reliability and readability of code handling paths.

Thanks, I'm well aware of pathlib. I've already evaluated using it in the code you're showing. It did not help, and occasionally made the implementation more complicated.

As a quick workaround, see the above, i.e. replacing os.path.altsep with os.path.sep in the paths.

This seems like obviously a bad idea. If / has literal meaning instead of a separator in UNC paths, then turning it into a separator can produce incorrect results.

davidism commented 2 years ago

Also note that we have to join using / to avoid a security issue on Windows, because otherwise drive-relative and UNC paths can potentially escape the base directory. So undoing that by replacing / with \ would reintroduce the security issue.

UlrichEckhardt commented 2 years ago

Hello David! Thanks for your feedback. I'm aware I was a bit vague, it was a long and warm day yesterday. I'll dig further into it today and hopefully be able to provide a full example.

UlrichEckhardt commented 2 years ago

Just a short update: It's not trivial to reproduce in isolation. The reason is that importlib.util.find_spec() gives me "normal" paths ("C:\foo\bar") for modules when running Python on its own. It only gives UNC paths ("\\?\C\foo\bar") when running inside of Bazel and only for some modules. For reference, there is also a relatively new bug report for Bazel.

If I understand the intentions correctly, the Jinja template is specified using a notation similar to a POSIX path. In loaders.py, in split_template_path(), you split this using "/" as separator, while filtering some invalid path elements. You then you use POSIX path operations to append it to the OS-specific path from the module. The result is then passed to os.path.normpath(). The comment there reads "Use normpath to convert Windows altsep to sep.". The problem is that os.path.normpath() doesn't touch a path that starts with "\\?\", and IMHO that's correct. The reason is that "." and ".." are valid filenames and "/" can also be part of a filename. For reference, I found this bug which discusses a similar topic.

Also note that we have to join using / to avoid a security issue on Windows, because otherwise drive-relative and UNC paths can potentially escape the base directory. So undoing that by replacing / with \ would reintroduce the security issue.

Can you elaborate, perhaps give some examples? It's unfortunately not obvious to me...

davidism commented 2 years ago

If the template name e:secret.txt was given, then ntpath.join(base, name) (Windows) would produce e:secret.txt, the drive relative letter is treated as an absolute path, and join gives precedence to absolute paths. Now secret.txt would be looked up in the current directory but on a different drive. Security issue.

I know there's a similar issue with (non-literal) UNC paths, but I can't remember an example of that one.

So instead, Jinja joins using posixpath.join, which uses / to join segments and doesn't recognize Windows drive-relative paths as absolute.

UlrichEckhardt commented 2 years ago

I see, didn't expect that semantic of joining two absolute paths. For some reason, pathlib retained that behaviour.

Still, it can conveniently solve the task:

# template_root from the importlib spec
root = pathlib.Path(template_root)
# template from the get_source() function parameter
# important, the resolve() will eliminate "." and ".." and resolve symbolic links
template_path = (root / template).resolve()
# check if template_path is in root
if not template_path.is_relative_to(root): raise Exception()
davidism commented 2 years ago

Happy to review a PR.

UlrichEckhardt commented 2 years ago

Heya! :) There is https://github.com/UlrichEckhardt/jinja/releases/tag/jfrog-dev1 as a preview. I need to do some more testing due to the complicated software stack here, but it seems to have fixed the issue. I'll clean this up when I find time, probably next week rather, and file a proper PR then. I'd appreciate if you could take a look, just to say if this is heading the right direction.

LilyFoote commented 2 years ago

I have run into this bug when trying to convert a setuptools project to a maturin (rust) project. My python code uses jinja2's PackageLoader, which works when built with setuptools. When built with maturin, however, the spec.submodule_search_locations contains a UNC path.

I suspect (but haven't proven) that maturin uses rust standard library code that returns UNC paths: https://github.com/rust-lang/rust/issues/42869

bricehalder commented 1 year ago

I've also run into this when using Buck2 which is built in Rust. As a workaround I ended up just using FileSystemLoader and providing the template in the working directory, bypassing the need to provide a file path.