python / cpython

The Python programming language
https://www.python.org/
Other
60.83k stars 29.36k forks source link

`os.path.realpath('notadir/', strict=True)` doesn't raise #118289

Open barneygale opened 2 months ago

barneygale commented 2 months ago

Bug report

Bug description:

GNU coreutils realpath -e errors out when given a path to a file (not a directory) with a trailing slash:

$ realpath -e python/
realpath: python/: Not a directory

... but Python's os.path.realpath(..., strict=True) doesn't raise any error:

>>> os.path.isfile('python')
True
>>> os.path.isdir('python')
False
>>> os.path.realpath('python/', strict=True)  # should raise NotADirectoryError
'/home/barney/projects/cpython/python'
>>> os.path.realpath('python/.', strict=True)  # should raise NotADirectoryError
'/home/barney/projects/cpython/python'

CPython versions tested on:

3.12, 3.13, CPython main branch

Operating systems tested on:

Linux

Linked PRs

nineteendo commented 2 months ago

Here's a list of all invalid paths for posix.stat(). All of these should raise an error for posixpath.realpath() in strict mode:

Should I make an issue for the symlinks?

barneygale commented 2 months ago

Should I make an issue for the symlinks?

It's deliberate behaviour - os.path.realpath() doesn't place a limit on the number of symlink traversals, unlike Linux or Mac.

Does it differ from realpath -e?

nineteendo commented 2 months ago

It's deliberate behaviour - os.path.realpath() doesn't place a limit on the number of symlink traversals, unlike Linux or Mac.

You mean the built-in function in macOS?

wannes@Stefans-iMac dirs % realpath s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s
realpath: s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s: Too many levels of symbolic links

Maybe a max_symlinks argument would make more sense? With None for the system default & -1 for no limit.

novaTopFlex commented 2 months ago

I have just tested the os.path.realpath() function on my different python3 versions and have gotten mostly the same issue.

If one is still on python3.9: The strict argument is not an available argument in Python version 3.9.x.

If one is on python3.10, python3.11, python3.12, or python3.13: The strict argument is recognized, but the code does not raise a NotADirectoryError.

nineteendo commented 2 months ago

ntpath.realpath() also raises an error, I'll try to check with coreutils as well.

>>> import os
>>> os.path.realpath('s/' * 64, strict=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen ntpath>", line 722, in realpath
OSError: [WinError 1921] The name of the file cannot be resolved by the system: 'C:\\Users\\wanne\\path-picker\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s'
nineteendo commented 2 months ago

Hmm, only with '--logical':

wannes@Stefans-iMac dirs % grealpath -e s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s
/Users/wannes/path-picker/link-test/dirs
wannes@Stefans-iMac dirs % grealpath -L s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s
grealpath: s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s: Too many levels of symbolic links
eryksun commented 2 months ago

On POSIX systems, wouldn't it be reasonable for posixpath.realpath() in strict mode to try to match the behavior of POSIX realpath()? This fails with ELOOP if too many levels of symlinks are encountered. On Windows, WinAPI CreateFileW() fails with ERROR_CANT_RESOLVE_FILENAME (1921) in this case, and ntpath.realpath() in strict mode raises a corresponding OSError. That seems correct to me.

Linux:

import os, ctypes
pg = ctypes.CDLL(None, use_errno=True)
buf = (ctypes.c_char * 4096)()
pg.realpath.restype = ctypes.c_char_p
pg.realpath.argtypes = (ctypes.c_char_p, ctypes.c_char_p)
for i in range(41):
    os.symlink(f's{i+1}', f's{i}')
open('s41', 'w').close()
>>> os.path.realpath('s0', strict=True)
'/home/user/Temp/test/symlinks/s41'
>>> pg.realpath(b's0', buf)
>>> ctypes.get_errno() == errno.ELOOP
True
>>> os.strerror(errno.ELOOP)
'Too many levels of symbolic links'
barneygale commented 2 months ago

Would that be instead of, or in addition to, the existing symlink loop detection? The current code uses a seen dict that can grow to an arbitrary size. I'm guessing OSs use a counter because it's faster and uses less memory? (And presumably more resilient against DoS via long-but-not-looping chains of symlinks?)

eryksun commented 2 months ago

The glibc implementation of realpath() increments a counter each time that readlink() is called. It fails with ELOOP if the counter exceeds __eloop_threshold(). On Linux the latter returns a minimum default value MIN_ELOOP_THRESHOLD (40) because sysconf(_SC_SYMLOOP_MAX) returns -1, and _POSIX_SYMLOOP_MAX is just 8. This is explained in a long comment, which I'll copy here for reference:


/* POSIX specifies SYMLOOP_MAX as the "Maximum number of symbolic
   links that can be reliably traversed in the resolution of a
   pathname in the absence of a loop."  This makes it a minimum that
   we should certainly accept.  But it leaves open the possibility
   that more might sometimes work--just not "reliably".

   For example, Linux implements a complex policy whereby there is a
   small limit on the number of direct symlink traversals (a symlink
   to a symlink to a symlink), but larger limit on the total number of
   symlink traversals overall.  Hence the SYMLOOP_MAX number should be
   the small one, but the limit library functions enforce on users
   should be the larger one.

   So, we use the larger of the reported SYMLOOP_MAX (if any) and our
   own constant MIN_ELOOP_THRESHOLD, below.  This constant should be
   large enough that it never rules out a file name and directory tree
   that the underlying system (i.e. calls to 'open' et al) would
   resolve successfully.  It should be small enough that actual loops
   are detected without a huge number of iterations.  */

On Windows, nt._getfinalpathname() calls CreateFileW() to get a handle to pass to GetFinalPathNameByHandleW(), so it's subject to the normal way the system opens the target path. In the kernel, the Object Manager increments a counter each time a reparse is encountered (i.e. the parse routine of a named object returns with the status code STATUS_REPARSE) in the outer parsing loop, including for the Object Manager's own "SymbolicLink" objects (e.g. used for DOS device names such as "C:", "NUL", etc). Currently the limit is 64 reparse attempts, after which the Object Manager gives up and returns STATUS_OBJECT_NAME_NOT_FOUND. If the path being parsed was on an I/O Manager "Device" object (e.g. "\Device\HarddiskVolume2\Mount\SpamDrive"), than there's an I/O Open Packet that indicates whether a filesystem reparse point (e.g. a junction mountpoint or symlink) was being traversed. In this case, the I/O Manager substitutes the more precise status code STATUS_REPARSE_POINT_NOT_RESOLVED. The Windows API translates this to the Windows error code ERROR_CANT_RESOLVE_FILENAME (1921). Note that the limit of 64 reparse attempts is for the overall path parsing. Thus, for example, if you have a sequence of 32 directory symlinks "s0" to "s31", culminating at directory "s32", plus a sequence of 32 directory symlinks "s32\t0" to "s32\t31" culminating at directory "s32\t32", then you can resolve "s0" as "s32", and you can resolve "s32\t0" as "s32\t32", but "s0\t0" fails with ERROR_CANT_RESOLVE_FILENAME.

eryksun commented 2 months ago

Python's os.sysconf_names dict is missing some of the names that are required by Issue 7 of POSIX, including "SC_SYMLOOP_MAX" (added in Issue 6). Here's all of the missing names:

SC_2_PBS
SC_2_PBS_ACCOUNTING
SC_2_PBS_CHECKPOINT
SC_2_PBS_LOCATE
SC_2_PBS_MESSAGE
SC_2_PBS_TRACK
SC_ADVISORY_INFO
SC_BARRIERS
SC_CLOCK_SELECTION
SC_CPUTIME
SC_HOST_NAME_MAX
SC_IPV6
SC_MONOTONIC_CLOCK
SC_RAW_SOCKETS
SC_READER_WRITER_LOCKS
SC_REGEXP
SC_SHELL
SC_SPAWN
SC_SPIN_LOCKS
SC_SPORADIC_SERVER
SC_SS_REPL_MAX
SC_SYMLOOP_MAX
SC_THREAD_CPUTIME
SC_THREAD_ROBUST_PRIO_INHERIT
SC_THREAD_ROBUST_PRIO_PROTECT
SC_THREAD_SPORADIC_SERVER
SC_TIMEOUTS
SC_TRACE
SC_TRACE_EVENT_FILTER
SC_TRACE_EVENT_NAME_MAX
SC_TRACE_INHERIT
SC_TRACE_LOG
SC_TRACE_NAME_MAX
SC_TRACE_SYS_MAX
SC_TRACE_USER_EVENT_MAX
SC_TYPED_MEMORY_OBJECTS
SC_V6_ILP32_OFF32
SC_V6_ILP32_OFFBIG
SC_V6_LP64_OFF64
SC_V6_LPBIG_OFFBIG
SC_V7_ILP32_OFF32
SC_V7_ILP32_OFFBIG
SC_V7_LP64_OFF64
SC_V7_LPBIG_OFFBIG
SC_XOPEN_STREAMS
SC_XOPEN_UUCP
nineteendo commented 2 months ago

Let's track this further in #118441.

nineteendo commented 2 months ago

How about secret symlinks? Should these be allowed in non-strict mode?

wannes@Stefans-iMac dirs % sudo ls -l secret-symlink
l---------  1 wannes  staff  44 Jun 30  2023 secret-symlink -> /Users/wannes/path-picker/link-test/dirs/dir
wannes@Stefans-iMac dirs % grealpath -m secret-symlink
/Users/wannes/path-picker/link-test/dirs/secret-symlink
>>> import posixpath
>>> posixpath.realpath("secret-symlink")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen posixpath>", line 435, in realpath
  File "<frozen posixpath>", line 495, in _joinrealpath
PermissionError: [Errno 13] Permission denied: 'secret-symlink'
barneygale commented 2 months ago

Looks like a bug to me!

nineteendo commented 2 months ago

OK, see #118447.

nineteendo commented 1 month ago

It has been a month, is there anyone who would also like to review the pull request?