python / cpython

The Python programming language
https://www.python.org
Other
63.1k stars 30.22k forks source link

resolve() fails when the path doesn't exist #63916

Closed pitrou closed 7 years ago

pitrou commented 10 years ago
BPO 19717
Nosy @brettcannon, @pitrou, @tifv, @sorcio, @serhiy-storchaka, @zooba, @vajrasky
PRs
  • python/cpython#552
  • python/cpython#1649
  • Dependencies
  • bpo-19887: Path.resolve() fails on complex symlinks
  • Files
  • add_non_strict_resolve_pathlib.patch
  • add_non_strict_resolve_pathlib_v2.patch
  • add_non_strict_resolve_pathlib_v3.patch
  • add_non_strict_resolve_pathlib_v4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/zooba' closed_at = created_at = labels = ['3.7', 'deferred-blocker', 'type-feature', 'library'] title = "resolve() fails when the path doesn't exist" updated_at = user = 'https://github.com/pitrou' ``` bugs.python.org fields: ```python activity = actor = 'gvanrossum' assignee = 'steve.dower' closed = True closed_date = closer = 'steve.dower' components = ['Library (Lib)'] creation = creator = 'pitrou' dependencies = ['19887'] files = ['32947', '33075', '33090', '33171'] hgrepos = [] issue_num = 19717 keywords = ['patch'] message_count = 25.0 messages = ['203819', '205065', '205066', '205067', '205068', '205079', '205095', '205101', '205779', '205892', '205897', '206397', '206408', '206413', '206415', '206702', '265336', '275173', '280294', '280313', '280429', '280432', '280450', '280451', '280452'] nosy_count = 11.0 nosy_names = ['brett.cannon', 'pitrou', 'july', 'neologix', 'davide.rizzo', 'python-dev', 'serhiy.storchaka', 'steve.dower', 'vajrasky', 'mwagner', 'Philip Ridout'] pr_nums = ['552', '1649'] priority = 'deferred blocker' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue19717' versions = ['Python 3.6', 'Python 3.7'] ```

    pitrou commented 10 years ago

    Currently Path.resolve() raises FileNotFoundError when the path does not exist. Guido pointed out that it may more useful to resolve the path components until one doesn't exist, and then return the rest unchanged.

    e.g. if /home/ points to /var/home/, Path("/home/antoine/toto") should resolve to Path("/var/home/antoine/toto") even if toto doesn't actually exist.

    However, this makes the function less safe. Perhaps with a "strict" flag?

    pitrou commented 10 years ago

    (note that POSIX realpath() fails with ENOENT if the file doesn't exist: http://pubs.opengroup.org/onlinepubs/9699919799/functions/realpath.html )

    gvanrossum commented 10 years ago

    Hm, so we can choose to be more like POSIX realpath() or more like os.path.realpath(). I guess your original intuition was right. Close with no action is fine. If I need a partial real path I can always crawl up parents() until I find a path that does exist.

    pitrou commented 10 years ago

    I think there's value in allowing the "less strict" behaviour with an optional arg, though. i.e.:

    >>> Path('toto').resolve()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/antoine/cpython/default/Lib/pathlib.py", line 1024, in resolve
        s = self._flavour.resolve(self)
      File "/home/antoine/cpython/default/Lib/pathlib.py", line 282, in resolve
        target = accessor.readlink(cur)
      File "/home/antoine/cpython/default/Lib/pathlib.py", line 372, in readlink
        return os.readlink(path)
    FileNotFoundError: [Errno 2] No such file or directory: '/home/antoine/cpython/default/toto'
    >>> Path('toto').resolve(strict=False)
    PosixPath('/home/antoine/cpython/default/toto')
    gvanrossum commented 10 years ago

    Sure.

    ce7d5904-2ae0-4bc9-8085-2279bf8114aa commented 10 years ago

    Here is the preliminary patch. Only tested on Linux. Later I'll test it on Windows.

    serhiy-storchaka commented 10 years ago

    The readlink utility has different modes for canonization:

       -f, --canonicalize
              canonicalize by following every symlink in every component of the given name recursively; all but the last component must exist
    
       -e, --canonicalize-existing
              canonicalize by following every symlink in every component of the given name recursively, all components must exist
    
       -m, --canonicalize-missing
              canonicalize by following every symlink in every component of the given name recursively, without requirements on components existence

    I think every mode has use cases.

    pitrou commented 10 years ago

    I think every mode has use cases.

    Probably, but which ones are the most likely? A ternary flag leads to a clumsier API than a simple binary flag.

    ce7d5904-2ae0-4bc9-8085-2279bf8114aa commented 10 years ago

    Thanks for the review, Antoine! Here is the updated patch. I haven't tested it on Windows yet because I want to clarify one thing.

    Let's say we have this valid directory:

    /tmp/@test123 \<= directory

    And this directory only has one valid file:

    /tmp/@test123/cutecat \<= file

    We agree that pathlib.Path('/tmp/@test123/foo').resolve(False) => '/tmp/@test123/foo'.

    But what about this case: pathlib.Path('/tmp/@test123/cutecat/foo').resolve(False)?

    It should be "/tmp/@test123/cutecat" or "/tmp/@test123/cutecat/foo"?

    ce7d5904-2ae0-4bc9-8085-2279bf8114aa commented 10 years ago

    Here is the patch with Windows support. I notice there is difference regarding resolving symbolic link with parent dir (linkA/..) between Posix and Windows.

    On Windows, if linkY points to dirB, 'dirA\linkY\..' resolves to 'dirA' without resolving linkY first. It means, Windows resolves parent dir first before symbolic link.

    C:\Users\vajrasky\Code\playplay\pycode>mkdir dirA

    C:\Users\vajrasky\Code\playplay\pycode>mkdir dirB

    C:\Users\vajrasky\Code\playplay\pycode>cd dirA

    C:\Users\vajrasky\Code\playplay\pycode\dirA>mklink /D linkC ..\dirB symbolic link created for linkC \<\<===>> ..\dirB

    C:\Users\vajrasky\Code\playplay\pycode\dirA>cd ..\

    C:\Users\vajrasky\Code\playplay\pycode>cd dirA\linkC\..

    C:\Users\vajrasky\Code\playplay\pycode\dirA>

    But on Posix, if linkY points to dirB, 'dirA\linkY\..' resolves to 'dirB\..' then to the parent dir of dirB. It means, Posix resolves symbolic link first before parent dir.

    $ mkdir dirA
    $ mkdir dirB
    $ cd dirA
    $ ln -s ../dirB linkC
    $ cd ..
    $ ls dirA/linkC/..
    dirA    dirB
    serhiy-storchaka commented 10 years ago

    I think a patch in bpo-19887 first should be committed. It totally rewrites resolve().

    ce7d5904-2ae0-4bc9-8085-2279bf8114aa commented 10 years ago

    Updated patch to tip. Later, I refactor Windows code to make sure it does not loop forever.

    serhiy-storchaka commented 10 years ago

    Vajrasky's patch implements fourth strategy, which is not conform neither --canonicalize nor --canonicalize-missing. Path(BASE, 'foo', 'in', 'spam') is resolved to Path(BASE, 'foo'). I doubt that this is most expected behavior.

    ce7d5904-2ae0-4bc9-8085-2279bf8114aa commented 10 years ago

    I based my implementation on this statement:

    "Guido pointed out that it may more useful to resolve the path components until one doesn't exist"

    "until one doesn't exist" in this case means P(BASE, 'foo', 'in', 'spam') resolves until BASE / 'foo'.

    If different spec is better, I'll change the implementation.

    serhiy-storchaka commented 10 years ago

    I believe Guido meant one of standard strategies. Current posixpath.realpath() implementation conforms --canonicalize-missing. It is not clear which two strategies (from three) should be used in Path.resolve().

    pitrou commented 10 years ago

    Well, given the diversity of possible behaviours, it starts to seem like it should maybe be discussed on python-dev.

    serhiy-storchaka commented 8 years ago

    I have written a patch that implements all three mode for posixpath.realpath() (bpo-27002). Having the implementation we can test and research it. After discussion this solution can be adopted for Path.resolve().

    brettcannon commented 8 years ago

    Guido has kept his opinion that it should resolve until the path no longer exists and then just stop: https://mail.python.org/pipermail/python-ideas/2016-September/042203.html

    b119808a-782e-490e-90b4-8c26ff83a174 commented 7 years ago

    i have a use-case that requires a behavior that is referenced above as --canonicalize-missing. essentially i need to get rid of relative parts in a path representation regardless the actual filesystem.

    my conclusion was that PurePath could provide that with a 'normpath' method, reflecting os.path.normpath's functionality.

    (that's from a user perspective, i haven't looked at any implementation details of the pathlib module.)

    gvanrossum commented 7 years ago

    Please make sure this lands in beta 4!

    --Guido (mobile)

    zooba commented 7 years ago

    Anyone have any major concerns with add_non_strict_resolve_pathlib_v4.patch?

    I'd be quite happy without adding the extra parameter to Path.resolve(), but I'm not strongly offended.

    From Guido's email we should default to strict=False (i.e. don't throw if the file doesn't exist) rather than True.

    gvanrossum commented 7 years ago

    I'd be very happy if that landed in 3.6 with the default strict=False.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 7 years ago

    New changeset 03bbee2b0d28 by Steve Dower in branch '3.6': Issue bpo-19717: Makes Path.resolve() succeed on paths that do not exist (patch by Vajrasky Kok) https://hg.python.org/cpython/rev/03bbee2b0d28

    New changeset 445415e402be by Steve Dower in branch 'default': Issue bpo-19717: Makes Path.resolve() succeed on paths that do not exist (patch by Vajrasky Kok) https://hg.python.org/cpython/rev/445415e402be

    zooba commented 7 years ago

    Applied. I changed the default for the parameter and updated the docs, but the rest is as in the patch.

    gvanrossum commented 7 years ago

    Thanks!