python / cpython

The Python programming language
https://www.python.org
Other
62.68k stars 30.06k forks source link

Different behavior of os.path.realpath('nul') in 3.7 and 3.8 #82262

Closed 45386f33-fadd-4f9f-b70b-e36d26921b08 closed 5 years ago

45386f33-fadd-4f9f-b70b-e36d26921b08 commented 5 years ago
BPO 38081
Nosy @pfmoore, @tjguk, @zware, @eryksun, @zooba, @miss-islington, @Savier
PRs
  • python/cpython#15899
  • python/cpython#15903
  • python/cpython#16156
  • python/cpython#16187
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['3.8', 'type-bug', 'library', '3.9', 'OS-windows'] title = "Different behavior of os.path.realpath('nul') in 3.7 and 3.8" updated_at = user = 'https://github.com/Savier' ``` bugs.python.org fields: ```python activity = actor = 'eryksun' assignee = 'none' closed = True closed_date = closer = 'steve.dower' components = ['Library (Lib)', 'Windows'] creation = creator = 'iamsav' dependencies = [] files = [] hgrepos = [] issue_num = 38081 keywords = ['patch'] message_count = 14.0 messages = ['351577', '351578', '351583', '351725', '351764', '351785', '351787', '351807', '351930', '352044', '352468', '352548', '352550', '352643'] nosy_count = 7.0 nosy_names = ['paul.moore', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'miss-islington', 'iamsav'] pr_nums = ['15899', '15903', '16156', '16187'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue38081' versions = ['Python 3.8', 'Python 3.9'] ```

    45386f33-fadd-4f9f-b70b-e36d26921b08 commented 5 years ago

    Windows 10:

    C:\Users\User\Downloads>py -3.7 -c "import os.path;os.path.realpath('nul')"
    
    C:\Users\User\Downloads>py -3.8 -c "import os.path;os.path.realpath('nul')"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Python38\lib\ntpath.py", line 592, in realpath
        path = _getfinalpathname_nonstrict(path)
      File "C:\Python38\lib\ntpath.py", line 566, in _getfinalpathname_nonstrict
        path = _readlink_deep(path, seen)
      File "C:\Python38\lib\ntpath.py", line 536, in _readlink_deep
        path = _nt_readlink(path)
    OSError: [WinError 1] Неверная функция: 'nul'

    I think it's a bug. pip uses this code, so 'pip install pandas' won't work in 3.8

    45386f33-fadd-4f9f-b70b-e36d26921b08 commented 5 years ago

    1) Python 3.8.0b4 (tags/v3.8.0b4:d93605d, Aug 29 2019, 23:21:28) [MSC v.1916 64 bit (AMD64)] on win32 2) Looks like identical with https://bugs.python.org/issue1311

    45386f33-fadd-4f9f-b70b-e36d26921b08 commented 5 years ago

    It breaks setuptools.sandbox.DirectorySandbox.__init__() with default param 'exceptions' which includes os.devnull and calls os.path.realpath() on it. So, many distributions crashes.

    zooba commented 5 years ago

    What does pip use it for?

    Applying the below change avoids the exception, but produces \\.\nul as the result, which may or may not be any better.

    diff --git a/Lib/ntpath.py b/Lib/ntpath.py
    index 1d22d5f1dc..becfa20a83 100644
    --- a/Lib/ntpath.py
    +++ b/Lib/ntpath.py
    @@ -537,7 +537,7 @@ else:
                 except OSError as ex:
                     # Stop on file (2) or directory (3) not found, or
                     # paths that are not reparse points (4390)
    -                if ex.winerror in (2, 3, 4390):
    +                if ex.winerror in (1, 2, 3, 4390):
                         break
                     raise
                 except ValueError:
    @@ -555,7 +555,7 @@ else:
    
             # Allow file (2) or directory (3) not found, invalid syntax (123),
             # and symlinks that cannot be followed (1921)
    -        allowed_winerror = 2, 3, 123, 1921
    +        allowed_winerror = 2, 3, 87, 123, 1921
         # Non-strict algorithm is to find as much of the target directory
         # as we can and join the rest.
    45386f33-fadd-4f9f-b70b-e36d26921b08 commented 5 years ago

    setuptools/sandbox.py: class DirectorySandbox(AbstractSandbox): """Restrict operations to a single subdirectory - pseudo-chroot"""

    When running user scripts it uses os.path.realpath(os.devnull) to include 'normalized' devnull to the allowed list of files in pseudo-chroot.

    Yes, suggested patch returns realpath behavior from 3.7 and packages installs normally.

    C:\Users\User\Downloads>py -3.7 -c "import os.path;print(os.path.realpath('nul'))" \\.\nul

    C:\Users\User\Downloads>py -3.8 -c "import os.path;print(os.path.realpath('nul'))" \\.\nul

    I think it must be included in 3.8 or windows users will get installation problems.

    zware commented 5 years ago

    New changeset 92521fea5d0d4aeb9b6a3c3fdd4654af700ad5c8 by Zachary Ware (Steve Dower) in branch 'master': bpo-38081: Fixes ntpath.realpath('NUL') (GH-15899) https://github.com/python/cpython/commit/92521fea5d0d4aeb9b6a3c3fdd4654af700ad5c8

    zooba commented 5 years ago

    Thanks for the report!

    miss-islington commented 5 years ago

    New changeset 57491de7c33c5886c4cae2f468b474d9b4e6fed2 by Miss Islington (bot) in branch '3.8': bpo-38081: Fixes ntpath.realpath('NUL') (GH-15899) https://github.com/python/cpython/commit/57491de7c33c5886c4cae2f468b474d9b4e6fed2

    eryksun commented 5 years ago

    We should allow ERROR_INVALID_FUNCTION (1), ERROR_INVALID_PARAMETER (87), and ERROR_NOT_SUPPORTED (50) for readlink and _getfinalpathname, which can indicate a device that does not implement or is not mounted by a file system. We should also allow ERROR_BAD_NET_NAME (67, "the network name cannot be found"), which indicates that a server or share isn't found when opening a UNC path.

    I don't know whether ERROR_INVALID_NAME (123) should be allowed. Also, it hasn't been added already, but I'd be equally unsure about adding ERROR_BAD_PATHNAME (161). These aren't like a missing file, path, or server, or an unsupported device. I know Python's realpath() is supposed to be permissive, but that's going too far I think.

    Returning r"\\.\nul" is fine. I'd prefer to change os.devnull to match it. Scripts should be able to handle this since already abspath(os.devnull) is r"\\.\nul".

    eryksun commented 5 years ago

    In addition to ERROR_INVALID_FUNCTION (1), ERROR_INVALID_PARAMETER (87), and ERROR_NOT_SUPPORTED (50) for an unsupported device, and ERROR_BAD_NET_NAME (67) for a missing server or share, it should also allow common permission errors:

    ERROR_ACCESS_DENIED (5)
        C:/Users/someone_else
        C:/Temp/deleted_but_still_linked_file
        E:/locked_volume
    
    ERROR_NOT_READY (21)
        D:/no_media
    
    ERROR_SHARING_VIOLATION (32)
        C:/pagefile.sys
    zooba commented 5 years ago

    I added another PR with the additional error codes listed by Eryk Sun.

    Theoretically we should be able to test most of them, but I haven't written those tests, and I'm not sure they'd prove enough to be worth the extra code. ntpath.realpath is a "best effort" function (realpath in all cases is, I guess), so given the choice between failing and returning a best-effort path, it's obviously better to go with the best-effort path.

    zooba commented 5 years ago

    New changeset 89b8933bb537179f81003928786c5cc6183af591 by Steve Dower in branch 'master': bpo-38081: Add more non-fatal error codes for ntpath.realpath (GH-16156) https://github.com/python/cpython/commit/89b8933bb537179f81003928786c5cc6183af591

    miss-islington commented 5 years ago

    New changeset 4924d558478c9bd7f7ee7cd9c00c72c0f281f1a5 by Miss Islington (bot) in branch '3.8': bpo-38081: Add more non-fatal error codes for ntpath.realpath (GH-16156) https://github.com/python/cpython/commit/4924d558478c9bd7f7ee7cd9c00c72c0f281f1a5

    eryksun commented 5 years ago

    Sorry, I mistakenly left out ERROR_BAD_NETPATH (53). It's at least used with mapped drives. For example, I have drive "M:" mapped to WebDAV "//live.sysinternals.com/tools", and I see this error if I disconnect the network:

        >>> try: nt._getfinalpathname('M:/')
        ... except OSError as e: print(e.winerror)
        ...
        53

    whereas if I access the underlying UNC path directly the error in this case is ERROR_BAD_NET_NAME (67):

        >>> try: nt._getfinalpathname('//live.sysinternals.com/tools')
        ... except OSError as e: print(e.winerror)
        ...
        67

    (Not that this case would normally succeed. A WebDAV share fails the internal request to get a normalized name, as expected for most network providers, but with an unexpected error code that's not handled by the API. It would succeed if we changed _getfinalpathname to fall back on getting the final name as opened, which skips expanding short component names.)

    --- Discussion

    The Multiple UNC Provider device (i.e. "\??\UNC" -> "\Device\Mup") resolves a UNC path prefix to a network provider (e.g. "Microsoft Windows Network" for SMB) by checking all providers in a registered order until one claims to handle the path. Typically a provider claims the server/share prefix, but the claimed prefix can be just the server, or a variable path length. Typically, if the "server" component isn't found, a provider returns STATUS_BAD_NETWORK_PATH. If the "share" component isn't found, it returns STATUS_BAD_NETWORK_NAME. However, since the request is to MUP, the final status is whatever MUP returns. As far as I can tell, post-Vista MUP prefix resolution returns STATUS_BAD_NETWORK_NAME even if all providers return STATUS_BAD_NETWORK_PATH.

    That said, MUP prefix resolution isn't used for mapped drives. A mapped drive sends the request directly to the provider that created the drive. Prior to Vista, this used to be a top-level named device such as "\Device\LanmanRedirector" (SMB). Since Vista, all redirected create/open requests are routed through MUP, but it doesn't use prefix resolution in this case. It has a sneaky way of implementing this. The provider's device name nowadays is an object SymbolicLink that targets MUP, but with a reserved component that indicates the redirector to use, e.g. "\Device\LanmanRedirector" -> "\Device\Mup\;LanmanRedirector". (A valid server name cannot begin with a semicolon, so this syntax is reserved by MUP. It also supports an optional second reserved component, with the drive name and logon session ID, such as ";Z:0000000000001234". These reserved components are removed from the parsed path, i.e. they are not included in the final path.) Nothing stops us from using this undocumented feature manually in a UNC path, as demonstrated by the examples below.

    The following shows that MUP parses ";LanmanRedirector" as the redirector name, not the "server" component.

        >>> os.path.samefile('//localhost/C$', '//;LanmanRedirector/localhost/C$')
        True

    The following shows that an explicit redirector path does not use prefix resolution. This open fails because there's no WebDAV server on localhost.

        >>> try: os.stat('//;WebDavRedirector/localhost/C$')
        ... except OSError as e: print(e.winerror)
        ...
        53

    The following shows that MUP fails an open with STATUS_OBJECT_PATH_INVALID (i.e. ERROR_BAD_PATHNAME, 161) if the redirector name is unknown:

        >>> try: os.stat('//;Lanman/localhost/C$')
        ... except OSError as e: print(e.winerror)
        ...
        161

    When we misspell the server name as "localhos", we see that the error for an explicit redirector path, as is used in a mapped drive, is ERROR_BAD_NETPATH (53):

        >>> try: os.stat('//;LanmanRedirector/localhos/C$')
        ... except OSError as e: print(e.winerror)
        ...
        53

    If we omit the explicit redirector name, then MUP tries prefix resolution, and the error is instead ERROR_BAD_NET_NAME (67):

        >>> try: os.stat('//localhos/C$')
        ... except OSError as e: print(e.winerror)
        ...
        67