python / cpython

The Python programming language
https://www.python.org
Other
62.33k stars 29.94k forks source link

Feature: provide (safe) traversal/extraction facilities for `zipfile.Path` #123727

Open jaraco opened 1 week ago

jaraco commented 1 week ago

zipfile.Path could provide its own traversal that could offer some safety checks.

Something like that might be nice. Though perhaps it could be more generic instead of being part of zipfile.Path, even if zipfile.Path would then perhaps provide an extraction API using that?

Originally posted by @obfusk in https://github.com/python/cpython/issues/123270#issuecomment-2330381740

jaraco commented 1 week ago

As mentioned in that issue, there does already exist one extraction API in importlib.resources.as_file, although that implementation is written to create a temporary context. It also has certain limitations, such as it can extract a single file or directory, but not a set of files in a directory, and there's no facility to filter content in subdirectories.

One thing to be careful about when extracting is that using mkdir(exists_ok=True) could lead to traversal outside the target directory (e.g. if the zip file contains ../../../etc/passwd).

Before we embark on any implementation, let's first capture what are the motivations, use-cases, and requirements for such a feature? Who would use it and how?

rruuaanng commented 1 week ago

For example, returning a list after calling, right?

jaraco commented 1 week ago

Perhaps. We're looking for more complete user stories. For example:

That's a contrived user story as an illustration. What I want are real user stories from users who have legitimate use-cases that aren't met by the current zipfile.Path and importlib.resources.as_file functionality.

obfusk commented 1 week ago

Being able to combine .glob() and extract seems potentially useful to me.

FYI: I just noticed .glob() does not seem to be documented.