Open barneygale opened 2 months ago
Here's a list of all invalid paths for posix.stat()
. All of these should raise an error for posixpath.realpath()
in strict mode:
newfile.txt
-> FileNotFoundError (OK)foo.txt/bar.txt
-> NotADirectoryError (NOK)"a" * 256
-> OSError (OK)"a/" * 512
-> OSError (OK)"s/" * 33
-> OSError (NOK)Should I make an issue for the symlinks?
Should I make an issue for the symlinks?
It's deliberate behaviour - os.path.realpath()
doesn't place a limit on the number of symlink traversals, unlike Linux or Mac.
Does it differ from realpath -e
?
It's deliberate behaviour -
os.path.realpath()
doesn't place a limit on the number of symlink traversals, unlike Linux or Mac.
You mean the built-in function in macOS?
wannes@Stefans-iMac dirs % realpath s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s
realpath: s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s: Too many levels of symbolic links
Maybe a max_symlinks
argument would make more sense? With None
for the system default & -1
for no limit.
I have just tested the os.path.realpath()
function on my different python3
versions and have gotten mostly the same issue.
If one is still on python3.9
:
The strict
argument is not an available argument in Python version 3.9.x.
If one is on python3.10
, python3.11
, python3.12
, or python3.13
:
The strict
argument is recognized, but the code does not raise a NotADirectoryError
.
ntpath.realpath()
also raises an error, I'll try to check with coreutils as well.
>>> import os
>>> os.path.realpath('s/' * 64, strict=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<frozen ntpath>", line 722, in realpath
OSError: [WinError 1921] The name of the file cannot be resolved by the system: 'C:\\Users\\wanne\\path-picker\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s\\s'
Hmm, only with '--logical':
wannes@Stefans-iMac dirs % grealpath -e s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s
/Users/wannes/path-picker/link-test/dirs
wannes@Stefans-iMac dirs % grealpath -L s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s
grealpath: s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s: Too many levels of symbolic links
On POSIX systems, wouldn't it be reasonable for posixpath.realpath()
in strict mode to try to match the behavior of POSIX realpath()
? This fails with ELOOP
if too many levels of symlinks are encountered. On Windows, WinAPI CreateFileW()
fails with ERROR_CANT_RESOLVE_FILENAME
(1921) in this case, and ntpath.realpath()
in strict mode raises a corresponding OSError
. That seems correct to me.
Linux:
import os, ctypes
pg = ctypes.CDLL(None, use_errno=True)
buf = (ctypes.c_char * 4096)()
pg.realpath.restype = ctypes.c_char_p
pg.realpath.argtypes = (ctypes.c_char_p, ctypes.c_char_p)
for i in range(41):
os.symlink(f's{i+1}', f's{i}')
open('s41', 'w').close()
>>> os.path.realpath('s0', strict=True)
'/home/user/Temp/test/symlinks/s41'
>>> pg.realpath(b's0', buf)
>>> ctypes.get_errno() == errno.ELOOP
True
>>> os.strerror(errno.ELOOP)
'Too many levels of symbolic links'
Would that be instead of, or in addition to, the existing symlink loop detection? The current code uses a seen
dict that can grow to an arbitrary size. I'm guessing OSs use a counter because it's faster and uses less memory? (And presumably more resilient against DoS via long-but-not-looping chains of symlinks?)
The glibc implementation of realpath()
increments a counter each time that readlink()
is called. It fails with ELOOP
if the counter exceeds __eloop_threshold()
. On Linux the latter returns a minimum default value MIN_ELOOP_THRESHOLD
(40) because sysconf(_SC_SYMLOOP_MAX)
returns -1, and _POSIX_SYMLOOP_MAX
is just 8. This is explained in a long comment, which I'll copy here for reference:
/* POSIX specifies SYMLOOP_MAX as the "Maximum number of symbolic
links that can be reliably traversed in the resolution of a
pathname in the absence of a loop." This makes it a minimum that
we should certainly accept. But it leaves open the possibility
that more might sometimes work--just not "reliably".
For example, Linux implements a complex policy whereby there is a
small limit on the number of direct symlink traversals (a symlink
to a symlink to a symlink), but larger limit on the total number of
symlink traversals overall. Hence the SYMLOOP_MAX number should be
the small one, but the limit library functions enforce on users
should be the larger one.
So, we use the larger of the reported SYMLOOP_MAX (if any) and our
own constant MIN_ELOOP_THRESHOLD, below. This constant should be
large enough that it never rules out a file name and directory tree
that the underlying system (i.e. calls to 'open' et al) would
resolve successfully. It should be small enough that actual loops
are detected without a huge number of iterations. */
On Windows, nt._getfinalpathname()
calls CreateFileW()
to get a handle to pass to GetFinalPathNameByHandleW()
, so it's subject to the normal way the system opens the target path. In the kernel, the Object Manager increments a counter each time a reparse is encountered (i.e. the parse routine of a named object returns with the status code STATUS_REPARSE
) in the outer parsing loop, including for the Object Manager's own "SymbolicLink" objects (e.g. used for DOS device names such as "C:", "NUL", etc). Currently the limit is 64 reparse attempts, after which the Object Manager gives up and returns STATUS_OBJECT_NAME_NOT_FOUND
. If the path being parsed was on an I/O Manager "Device" object (e.g. "\Device\HarddiskVolume2\Mount\SpamDrive"), than there's an I/O Open Packet that indicates whether a filesystem reparse point (e.g. a junction mountpoint or symlink) was being traversed. In this case, the I/O Manager substitutes the more precise status code STATUS_REPARSE_POINT_NOT_RESOLVED
. The Windows API translates this to the Windows error code ERROR_CANT_RESOLVE_FILENAME
(1921). Note that the limit of 64 reparse attempts is for the overall path parsing. Thus, for example, if you have a sequence of 32 directory symlinks "s0" to "s31", culminating at directory "s32", plus a sequence of 32 directory symlinks "s32\t0" to "s32\t31" culminating at directory "s32\t32", then you can resolve "s0" as "s32", and you can resolve "s32\t0" as "s32\t32", but "s0\t0" fails with ERROR_CANT_RESOLVE_FILENAME
.
Python's os.sysconf_names
dict is missing some of the names that are required by Issue 7 of POSIX, including "SC_SYMLOOP_MAX" (added in Issue 6). Here's all of the missing names:
SC_2_PBS
SC_2_PBS_ACCOUNTING
SC_2_PBS_CHECKPOINT
SC_2_PBS_LOCATE
SC_2_PBS_MESSAGE
SC_2_PBS_TRACK
SC_ADVISORY_INFO
SC_BARRIERS
SC_CLOCK_SELECTION
SC_CPUTIME
SC_HOST_NAME_MAX
SC_IPV6
SC_MONOTONIC_CLOCK
SC_RAW_SOCKETS
SC_READER_WRITER_LOCKS
SC_REGEXP
SC_SHELL
SC_SPAWN
SC_SPIN_LOCKS
SC_SPORADIC_SERVER
SC_SS_REPL_MAX
SC_SYMLOOP_MAX
SC_THREAD_CPUTIME
SC_THREAD_ROBUST_PRIO_INHERIT
SC_THREAD_ROBUST_PRIO_PROTECT
SC_THREAD_SPORADIC_SERVER
SC_TIMEOUTS
SC_TRACE
SC_TRACE_EVENT_FILTER
SC_TRACE_EVENT_NAME_MAX
SC_TRACE_INHERIT
SC_TRACE_LOG
SC_TRACE_NAME_MAX
SC_TRACE_SYS_MAX
SC_TRACE_USER_EVENT_MAX
SC_TYPED_MEMORY_OBJECTS
SC_V6_ILP32_OFF32
SC_V6_ILP32_OFFBIG
SC_V6_LP64_OFF64
SC_V6_LPBIG_OFFBIG
SC_V7_ILP32_OFF32
SC_V7_ILP32_OFFBIG
SC_V7_LP64_OFF64
SC_V7_LPBIG_OFFBIG
SC_XOPEN_STREAMS
SC_XOPEN_UUCP
Let's track this further in #118441.
How about secret symlinks? Should these be allowed in non-strict mode?
wannes@Stefans-iMac dirs % sudo ls -l secret-symlink
l--------- 1 wannes staff 44 Jun 30 2023 secret-symlink -> /Users/wannes/path-picker/link-test/dirs/dir
wannes@Stefans-iMac dirs % grealpath -m secret-symlink
/Users/wannes/path-picker/link-test/dirs/secret-symlink
>>> import posixpath
>>> posixpath.realpath("secret-symlink")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<frozen posixpath>", line 435, in realpath
File "<frozen posixpath>", line 495, in _joinrealpath
PermissionError: [Errno 13] Permission denied: 'secret-symlink'
Looks like a bug to me!
OK, see #118447.
It has been a month, is there anyone who would also like to review the pull request?
Bug report
Bug description:
GNU coreutils
realpath -e
errors out when given a path to a file (not a directory) with a trailing slash:... but Python's
os.path.realpath(..., strict=True)
doesn't raise any error:CPython versions tested on:
3.12, 3.13, CPython main branch
Operating systems tested on:
Linux
Linked PRs