Closed pitrou closed 7 years ago
Currently Path.resolve() raises FileNotFoundError when the path does not exist. Guido pointed out that it may more useful to resolve the path components until one doesn't exist, and then return the rest unchanged.
e.g. if /home/ points to /var/home/, Path("/home/antoine/toto") should resolve to Path("/var/home/antoine/toto") even if toto doesn't actually exist.
However, this makes the function less safe. Perhaps with a "strict" flag?
(note that POSIX realpath() fails with ENOENT if the file doesn't exist: http://pubs.opengroup.org/onlinepubs/9699919799/functions/realpath.html )
Hm, so we can choose to be more like POSIX realpath() or more like os.path.realpath(). I guess your original intuition was right. Close with no action is fine. If I need a partial real path I can always crawl up parents() until I find a path that does exist.
I think there's value in allowing the "less strict" behaviour with an optional arg, though. i.e.:
>>> Path('toto').resolve()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/antoine/cpython/default/Lib/pathlib.py", line 1024, in resolve
s = self._flavour.resolve(self)
File "/home/antoine/cpython/default/Lib/pathlib.py", line 282, in resolve
target = accessor.readlink(cur)
File "/home/antoine/cpython/default/Lib/pathlib.py", line 372, in readlink
return os.readlink(path)
FileNotFoundError: [Errno 2] No such file or directory: '/home/antoine/cpython/default/toto'
>>> Path('toto').resolve(strict=False)
PosixPath('/home/antoine/cpython/default/toto')
Sure.
Here is the preliminary patch. Only tested on Linux. Later I'll test it on Windows.
The readlink utility has different modes for canonization:
-f, --canonicalize
canonicalize by following every symlink in every component of the given name recursively; all but the last component must exist
-e, --canonicalize-existing
canonicalize by following every symlink in every component of the given name recursively, all components must exist
-m, --canonicalize-missing
canonicalize by following every symlink in every component of the given name recursively, without requirements on components existence
I think every mode has use cases.
I think every mode has use cases.
Probably, but which ones are the most likely? A ternary flag leads to a clumsier API than a simple binary flag.
Thanks for the review, Antoine! Here is the updated patch. I haven't tested it on Windows yet because I want to clarify one thing.
Let's say we have this valid directory:
/tmp/@test123 \<= directory
And this directory only has one valid file:
/tmp/@test123/cutecat \<= file
We agree that pathlib.Path('/tmp/@test123/foo').resolve(False) => '/tmp/@test123/foo'.
But what about this case: pathlib.Path('/tmp/@test123/cutecat/foo').resolve(False)?
It should be "/tmp/@test123/cutecat" or "/tmp/@test123/cutecat/foo"?
Here is the patch with Windows support. I notice there is difference regarding resolving symbolic link with parent dir (linkA/..) between Posix and Windows.
On Windows, if linkY points to dirB, 'dirA\linkY\..' resolves to 'dirA' without resolving linkY first. It means, Windows resolves parent dir first before symbolic link.
C:\Users\vajrasky\Code\playplay\pycode>mkdir dirA
C:\Users\vajrasky\Code\playplay\pycode>mkdir dirB
C:\Users\vajrasky\Code\playplay\pycode>cd dirA
C:\Users\vajrasky\Code\playplay\pycode\dirA>mklink /D linkC ..\dirB symbolic link created for linkC \<\<===>> ..\dirB
C:\Users\vajrasky\Code\playplay\pycode\dirA>cd ..\
C:\Users\vajrasky\Code\playplay\pycode>cd dirA\linkC\..
C:\Users\vajrasky\Code\playplay\pycode\dirA>
But on Posix, if linkY points to dirB, 'dirA\linkY\..' resolves to 'dirB\..' then to the parent dir of dirB. It means, Posix resolves symbolic link first before parent dir.
$ mkdir dirA
$ mkdir dirB
$ cd dirA
$ ln -s ../dirB linkC
$ cd ..
$ ls dirA/linkC/..
dirA dirB
I think a patch in bpo-19887 first should be committed. It totally rewrites resolve().
Updated patch to tip. Later, I refactor Windows code to make sure it does not loop forever.
Vajrasky's patch implements fourth strategy, which is not conform neither --canonicalize nor --canonicalize-missing. Path(BASE, 'foo', 'in', 'spam') is resolved to Path(BASE, 'foo'). I doubt that this is most expected behavior.
I based my implementation on this statement:
"Guido pointed out that it may more useful to resolve the path components until one doesn't exist"
"until one doesn't exist" in this case means P(BASE, 'foo', 'in', 'spam') resolves until BASE / 'foo'.
If different spec is better, I'll change the implementation.
I believe Guido meant one of standard strategies. Current posixpath.realpath() implementation conforms --canonicalize-missing. It is not clear which two strategies (from three) should be used in Path.resolve().
Well, given the diversity of possible behaviours, it starts to seem like it should maybe be discussed on python-dev.
I have written a patch that implements all three mode for posixpath.realpath() (bpo-27002). Having the implementation we can test and research it. After discussion this solution can be adopted for Path.resolve().
Guido has kept his opinion that it should resolve until the path no longer exists and then just stop: https://mail.python.org/pipermail/python-ideas/2016-September/042203.html
i have a use-case that requires a behavior that is referenced above as --canonicalize-missing. essentially i need to get rid of relative parts in a path representation regardless the actual filesystem.
my conclusion was that PurePath could provide that with a 'normpath' method, reflecting os.path.normpath's functionality.
(that's from a user perspective, i haven't looked at any implementation details of the pathlib module.)
Please make sure this lands in beta 4!
--Guido (mobile)
Anyone have any major concerns with add_non_strict_resolve_pathlib_v4.patch?
I'd be quite happy without adding the extra parameter to Path.resolve(), but I'm not strongly offended.
From Guido's email we should default to strict=False (i.e. don't throw if the file doesn't exist) rather than True.
I'd be very happy if that landed in 3.6 with the default strict=False.
New changeset 03bbee2b0d28 by Steve Dower in branch '3.6': Issue bpo-19717: Makes Path.resolve() succeed on paths that do not exist (patch by Vajrasky Kok) https://hg.python.org/cpython/rev/03bbee2b0d28
New changeset 445415e402be by Steve Dower in branch 'default': Issue bpo-19717: Makes Path.resolve() succeed on paths that do not exist (patch by Vajrasky Kok) https://hg.python.org/cpython/rev/445415e402be
Applied. I changed the default for the parameter and updated the docs, but the rest is as in the patch.
Thanks!
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/zooba' closed_at =
created_at =
labels = ['3.7', 'deferred-blocker', 'type-feature', 'library']
title = "resolve() fails when the path doesn't exist"
updated_at =
user = 'https://github.com/pitrou'
```
bugs.python.org fields:
```python
activity =
actor = 'gvanrossum'
assignee = 'steve.dower'
closed = True
closed_date =
closer = 'steve.dower'
components = ['Library (Lib)']
creation =
creator = 'pitrou'
dependencies = ['19887']
files = ['32947', '33075', '33090', '33171']
hgrepos = []
issue_num = 19717
keywords = ['patch']
message_count = 25.0
messages = ['203819', '205065', '205066', '205067', '205068', '205079', '205095', '205101', '205779', '205892', '205897', '206397', '206408', '206413', '206415', '206702', '265336', '275173', '280294', '280313', '280429', '280432', '280450', '280451', '280452']
nosy_count = 11.0
nosy_names = ['brett.cannon', 'pitrou', 'july', 'neologix', 'davide.rizzo', 'python-dev', 'serhiy.storchaka', 'steve.dower', 'vajrasky', 'mwagner', 'Philip Ridout']
pr_nums = ['552', '1649']
priority = 'deferred blocker'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue19717'
versions = ['Python 3.6', 'Python 3.7']
```