Open GordonAitchJay opened 1 year ago
OneDrive uses placeholder reparse points, which by default are disguised as regular files and directories if the process executable is outside of the "%SystemRoot%" tree. Placeholder reparse points are thus exposed to "cmd.exe" since it's inside "%SystemRoot%". You can check this via dir /al
(i.e. list only reparse points). PowerShell 7, on the other hand, is installed in "%ProgramFiles%", so by default placeholders would be disguised for it. However, it opts into exposing placeholders by calling RtlSetProcessPlaceholderCompatibilityMode(PHCM_EXPOSE_PLACEHOLDERS)
.
CPython has opted to use the default setting that disguises placeholder reparse points. It could be that the filesystem filter driver that handles OneDrive reparse points is failing in some way when placeholders are disguised. To rule out placeholder disguising as the cause of the different behavior, you could ask the SO user to try running the following code before calling os.listdir()
on the OneDrive "BigFolder" directory.
import ctypes
ntdll = ctypes.WinDLL('ntdll')
PHCM_EXPOSE_PLACEHOLDERS = 2
ntdll.RtlSetProcessPlaceholderCompatibilityMode.argtypes = (ctypes.c_char,)
ntdll.RtlSetProcessPlaceholderCompatibilityMode(PHCM_EXPOSE_PLACEHOLDERS)
FindFirstFileW()
/ FindNextFileW()
repeatedly calls NtQueryDirectoryFileEx()
: FileBothDirectoryInformation
. The "Both" in the name of this info class refers to returning both the normal name and the short name (if any) of each directory entry. It also returns the size of the extended attributes (EaSize
) set on a file or directory, if any. The latter field gets reused to return the reparse tag if the entry refers to a reparse point, since a reparse point can't have extended attributes.
Apparently my old ctypes code that calls NtQueryDirectoryFile()
: FileDirectoryInformation
works to list "BigFolder". Note that the FileDirectoryInformation
info class doesn't include short names or the EA size / reparse tag. If it turns out that the issue is with disguised placeholders, it may be that the bug is limited to either the NtQueryDirectoryFileEx()
system call or the FileBothDirectoryInformation
information class. That could be something to explore further.
I am far from the Windows internals expertise required to parse all of the above. @GordonAitchJay and @eryksun : would you have a layman's summary on what this means for usability of python and OneDrive, any possible workarounds, and what the ETA might look like?
I just started running into this issue this week, also with R. I wrote it off as a fluke until I ran into it with python as well. I can do os.listdir(f'{base_dir}/..')
and see that my target directory is listed, but os.listdir(base_dir)
yields:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Input In [39], in <cell line: 1>()
----> 1 os.listdir(base_dir)
OSError: [WinError 87] The parameter is incorrect: 'C:/Users/username/OneDrive - Company/path/to/target_dir'
@eryksun Thank you for your insight. It's very interesting!
What are the benefits of placeholder reparse points being disguised as regular files and directories if the process executable is outside of the "%SystemRoot%" tree? It's clearly a deliberate decision.
@jwhendy As far as I know, you're only the second Python user to have encountered this problem. I can't replicate it. Which directories do you have this problem with? It doesn't appear to be all directories managed by OneDrive, at least not all the time.
Instead of using os.listdir
, you can use ctypes to call the lower level NtQueryDirectoryFile
function. @eryksun originally posted an implementation in an answer to a question on StackOverflow. You would only need to make a few minor changes to make it a drop-in replacement for os.listdir
.
Please follow @eryksun's suggestion above which will prevent placeholder reparse points from being disguised as regular files and directories. Before calling os.listdir(base_dir)
, run this:
import ctypes
ntdll = ctypes.WinDLL('ntdll')
PHCM_EXPOSE_PLACEHOLDERS = 2
ntdll.RtlSetProcessPlaceholderCompatibilityMode.argtypes = (ctypes.c_char,)
ntdll.RtlSetProcessPlaceholderCompatibilityMode(PHCM_EXPOSE_PLACEHOLDERS)
Does os.listdir(base_dir)
now return the full list of files?
@eryksun Assuming the call to RtlSetProcessPlaceholderCompatibilityMode
works as a workaround, should CPython instead opt into exposing placeholders? What are the downsides? Though it seems silly to me that all software that uses FindFirstFile/FindNextFile/FindClose to list files (seemingly the canonical way) and happens to be outside of the "%SystemRoot%" tree must be updated so it calls RtlSetProcessPlaceholderCompatibilityMode
beforehand to avoid this issue.
I don't think there will be an ETA for a fix, since this similar issue raised by @eryksun was rejected because RtlSetProcessPlaceholderCompatibilityMode
requires Windows 10.
I suppose it would be possible to change the implementation of os.listdir
, since NtQueryDirectoryFile
, NtQueryInformationFile
, and CreateFileW
are all available on Windows XP, but I don't think that will actually happen. glob.glob
would need to be changed, too.
What are the benefits of placeholder reparse points being disguised as regular files and directories if the process executable is outside of the "%SystemRoot%" tree? It's clearly a deliberate decision.
This is explained in the documentation. Some programs mistakenly handle all reparse points as if they're symbolic links, instead of checking the name-surrogate bit [*] of the reparse tag using the macro IsReparseTagNameSurrogate()
or checking for a symlink exactly. Python's os.stat()
made this mistake prior to Python 3.7. To work around this, placeholder reparse points were implemented to be disguised by default for non-system processes.
Assuming the call to
RtlSetProcessPlaceholderCompatibilityMode
works as a workaround, should CPython instead opt into exposing placeholders? What are the downsides?
A downside to exposing placeholder reparse points is that os.scandir()
entries may not have updated basic stat data from the target of the placeholder, i.e. timestamps, file attributes, and file size. The entry may also have attributes that it otherwise doesn't have when placeholders are disguised, such as
FILE_ATTRIBUTE_REPARSE_POINT
(0x400), FILE_ATTRIBUTE_SPARSE_FILE
(0x200), FILE_ATTRIBUTE_OFFLINE
(0x1000), FILE_ATTRIBUTE_RECALL_ON_OPEN
(0x40000),FILE_ATTRIBUTE_RECALL_ON_DATA_ACCESS
(0x400000),FILE_ATTRIBUTE_PINNED
(0x80000), orFILE_ATTRIBUTE_UNPINNED
(0x100000).I suppose it would be possible to change the implementation of
os.listdir
, sinceNtQueryDirectoryFile
,NtQueryInformationFile
, andCreateFileW
are all available on Windows XP
It's unlikely that Python's standard library will ever call NTAPI system calls directly, such as NtQueryDirectoryFile()
.
The os
module could switch to using GetFileInformationByHandleEx()
with one of the directory information classes, such as FileIdBothDirectoryInfo
. I haven't reproduced this issue locally, but I can implement a demo of listdir()
based on GetFileInformationByHandleEx()
using ctypes.
I've actually wanted to switch to using FileIdBothDirectoryInfo
for a long time, to support the 64-bit FileId
and the ChangeTime
in the stat()
method of os.scandir()
entries. (Note that if an entry is a reparse point, the reparse tag is returned in the EaSize
field. A reparse point cannot have extended attributes. This is documented in [MS-FSCC] for NTAPI FileIdBothDirectoryInformation
.)
[*] Here is some optional background information on the two types of name-surrogate reparse points that are commonly used in the NTFS and ReFS filesystems, and how Python supports them. It's off topic, but I think it's important to understanding the overall system of reparse points, which is a complex subject that's limited to just Windows.
IO_REPARSE_TAG_SYMLINK
reparse points are Unix-like symbolic links, which can target any path on a local or remote filesystem. Python categorizes them as the Unix file type S_IFLNK
.
If a symlink targets a directory, the reparse point must be set on a directory, and thus it has the attribute FILE_ATTRIBUTE_DIRECTORY
. If a symlink targets a file, the reparse point must not be set on a directory.
When a symlink is accessed remotely by an SMB client, the server sends the symlink reparse data to the client, and the client is responsible for resolving the symlink and opening the target path.
The are four symlink evaluation policies that determine whether a symlink is allowed to be traversed.
Only the first two symlink evaluation policies are enabled by default. R2L symlinks are most likely encountered by accident when a tree containing local symlinks is shared with remote clients. I would never enable traversing R2L links.
IO_REPARSE_TAG_MOUNT_POINT
reparse points are Unix-like "bind" mount points (i.e. junctions), which can target any directory in a filesystem that is mounted on a local volume device. There's no corresponding Unix file type, so Python categorizes junctions as the Unix file type S_IFDIR
(directory). Note that, unlike Unix, the primary mount point for a filesystem that resides on a volume device is the root path in the namespace of the volume itself (e.g. "\Device\HarddiskVolume2\"; note the trailing backslash). More or less equivalently, it can also be the root path on an alias for the volume device name, such as a DOS drive-letter name (e.g. "\??\C:\") or a volume GUID name (e.g. "\??\Volume{12345678-1234-1234-1234-123456781234}\").
A junction can be registered as a canonical volume mount point, as used by WinAPI GetFinalPathNameByHandleW()
with the flag VOLUME_NAME_DOS
. The target path must be the root path of a volume GUID name. This is how WinAPI SetVolumeMountPointW()
is implemented, as well as "mountvol.exe".
A junction can be traversed in a remote path. It's the responsibility of the remote server to resolve and traverse the junction. Thus the target path of a junction must be on a volume device that's local to the server.
To better match the behavior of mount points on Unix, junctions are handled specially when traversed in the kernel. Path parsing continues at the target of the junction, but, unlike reparsing a symlink, the target path does not replace the opened path. One or more ".." components in a subsequent relative symlink can thus traverse the opened path to a parent directory of the junction instead of the parent directory of the junction's target. This also allows consistent handling of junctions and symlinks in remote paths, since the client is responsible for resolving a relative symlink against the opened path, which may include a junction that was resolved on the server.
When passed follow_symlinks=False
, os.stat()
opens symlinks, junctions, and any other name-surrogate type of reparse point. os.symlink()
creates symlinks; there's no support for creating junctions. os.readlink()
returns the target path of symlinks and junctions. os.unlink()
deletes symlinks and junctions instead of deleting their targets, as does shutil.rmtree()
, but shutil.copytree()
copies and traverses a junction as a regular directory. os.path.islink()
is true only for symlinks. In 3.12, os.path.isjunction()
was added to test for junctions.
Here's a first draft ctypes-based prototype of scandir()
and listdir()
functions that call GetFileInformationByHandleEx()
to query the directory information class FileIdBothDirectoryInfo
. If the filesystem doesn't support FileIdBothDirectoryInfo
, the implementation falls back on the older information class FileFullDirectoryInfo
(e.g. currently the fallback is needed to list the named-pipe filesystem, "\\.\pipe\"). The path can be str
, bytes
, or a path-like object. It can also be a file descriptor for an open directory. A directory can be opened with os.open()
by using the flag O_OBTAIN_DIR
(0x2000).
If you can reproduce the reported problem with OneDrive and os.listdir()
, please test these two functions, and let me know if they work.
import os
import stat
import msvcrt
import collections
import ctypes
from ctypes import wintypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
ERROR_INVALID_FUNCTION = 1
ERROR_NO_MORE_FILES = 18
ERROR_NOT_SUPPORTED = 50
ERROR_INVALID_PARAMETER = 87
ERROR_MORE_DATA = 234
ERROR_DIRECTORY = 267
INVALID_HANDLE_VALUE = wintypes.HANDLE(-1).value
FILE_TYPE_DISK = 1
FILE_READ_DATA = 1
FILE_SHARE_READ = 1
OPEN_EXISTING = 3
FILE_FLAG_BACKUP_SEMANTICS = 0x02000000
O_OBTAIN_DIR = 0x2000 # os.open() flag that opens with backup semantics
FileBasicInfo = 0
FileIdBothDirectoryInfo = 10
FileIdBothDirectoryRestartInfo = 11
FileFullDirectoryInfo = 14
FileFullDirectoryRestartInfo = 15
FILE_INFO_BY_HANDLE_CLASS = wintypes.ULONG
LPSECURITY_ATTRIBUTES = wintypes.LPVOID
kernel32.CreateFileW.restype = wintypes.HANDLE
kernel32.CreateFileW.argtypes = (
wintypes.LPCWSTR, # In lpFileName
wintypes.DWORD, # In dwDesiredAccess
wintypes.DWORD, # In dwShareMode
LPSECURITY_ATTRIBUTES, # In_opt lpSecurityAttributes
wintypes.DWORD, # In dwCreationDisposition
wintypes.DWORD, # In dwFlagsAndAttributes
wintypes.HANDLE) # In_opt hTemplateFile
kernel32.GetFileInformationByHandleEx.argtypes = (
wintypes.HANDLE, # hFile
FILE_INFO_BY_HANDLE_CLASS, # FileInformationClass
wintypes.LPVOID, # lpFileInformation
wintypes.DWORD) # dwBufferSize
stat_result = collections.namedtuple('stat_result',
('st_mode', 'st_ino', 'st_dev', 'st_nlink', 'st_uid', 'st_gid', 'st_size',
'st_atime', 'st_mtime', 'st_ctime', 'st_btime', 'st_atime_ns',
'st_mtime_ns', 'st_ctime_ns', 'st_btime_ns', 'st_change_time',
'st_change_time_ns', 'st_file_attributes', 'st_reparse_tag'))
class FILE_BASIC_INFO(ctypes.Structure):
_fields_ = (('CreationTime', wintypes.LARGE_INTEGER),
('LastAccessTime', wintypes.LARGE_INTEGER),
('LastWriteTime', wintypes.LARGE_INTEGER),
('ChangeTime', wintypes.LARGE_INTEGER),
('FileAttributes', wintypes.DWORD))
class FILE_BASE_DIR_INFO(ctypes.Structure):
__slots__ = ()
@property
def FileName(self):
length = self._FileNameLength
if not length:
return ''
addr = ctypes.addressof(self) + type(self)._FileName.offset
size = length // ctypes.sizeof(wintypes.WCHAR)
return (wintypes.WCHAR * size).from_address(addr).value
@property
def EaSize(self):
# Since a reparse point cannot have extended attributes, the EaSize
# field is reused to store the reparse tag if the entry is a reparse
# point. This behavior is documented in [MS-FSCC].
# https://learn.microsoft.com/openspecs/windows_protocols/ms-fscc/e8d926d1-3a22-4654-be9c-58317a85540b
if not (self.FileAttributes & stat.FILE_ATTRIBUTE_REPARSE_POINT):
return self._EaSize
return 0
@property
def ReparseTag(self):
# See the comment about EaSize.
if self.FileAttributes & stat.FILE_ATTRIBUTE_REPARSE_POINT:
return self._EaSize
return 0
class FILE_FULL_DIR_INFO(FILE_BASE_DIR_INFO):
__slots__ = ()
_fields_ = (('_NextEntryOffset', wintypes.DWORD),
('_FileIndex', wintypes.DWORD),
('CreationTime', wintypes.LARGE_INTEGER),
('LastAccessTime', wintypes.LARGE_INTEGER),
('LastWriteTime', wintypes.LARGE_INTEGER),
('ChangeTime', wintypes.LARGE_INTEGER),
('EndOfFile', wintypes.LARGE_INTEGER),
('AllocationSize', wintypes.LARGE_INTEGER),
('FileAttributes', wintypes.DWORD),
('_FileNameLength', wintypes.DWORD),
('_EaSize', wintypes.DWORD),
('_FileName', wintypes.WCHAR * 1))
class FILE_ID_BOTH_DIR_INFO(FILE_BASE_DIR_INFO):
__slots__ = ()
_fields_ = (('_NextEntryOffset', wintypes.DWORD),
('_FileIndex', wintypes.DWORD),
('CreationTime', wintypes.LARGE_INTEGER),
('LastAccessTime', wintypes.LARGE_INTEGER),
('LastWriteTime', wintypes.LARGE_INTEGER),
('ChangeTime', wintypes.LARGE_INTEGER),
('EndOfFile', wintypes.LARGE_INTEGER),
('AllocationSize', wintypes.LARGE_INTEGER),
('FileAttributes', wintypes.DWORD),
('_FileNameLength', wintypes.DWORD),
('_EaSize', wintypes.DWORD),
('_ShortNameLength', wintypes.BYTE),
('_ShortName', wintypes.WCHAR * 12),
('FileId', wintypes.LARGE_INTEGER),
('_FileName', wintypes.WCHAR * 1))
class DirEntry:
__slots__ = ('_dirpath', '_info')
def __init__(self, dirpath, info):
self._dirpath = dirpath
self._info = info
def __repr__(self):
return '<{} {!r}>'.format(self.__class__.__name__, self.name)
@classmethod
def _listbuf(cls, buf, info_class, dirpath):
result = []
if info_class == FileIdBothDirectoryInfo:
info_struct = FILE_ID_BOTH_DIR_INFO
elif info_class == FileFullDirectoryInfo:
info_struct = FILE_FULL_DIR_INFO
else:
raise ValueError('unsupported information class')
base_size = ctypes.sizeof(info_struct) - ctypes.sizeof(wintypes.WCHAR)
offset = 0
while True:
tmp = info_struct.from_buffer(buf, offset)
if tmp._FileNameLength and tmp.FileName not in ('.', '..'):
info = info_struct()
size = base_size + tmp._FileNameLength
ctypes.resize(info, size)
ctypes.memmove(ctypes.byref(info), ctypes.byref(tmp), size)
entry = cls(dirpath, info)
result.append(entry)
if tmp._NextEntryOffset:
offset += tmp._NextEntryOffset
else:
break
return result
def _is_name_surrogate(self):
return bool(self._info.ReparseTag & 0x20000000)
def _is_reparse_point(self):
return bool(self._info.FileAttributes &
stat.FILE_ATTRIBUTE_REPARSE_POINT)
@property
def name(self):
if isinstance(self._dirpath, bytes):
return os.fsencode(self._info.FileName)
return self._info.FileName
@property
def path(self):
return os.path.join(self._dirpath, self.name)
def stat(self, follow_symlinks=True):
def nt_time_as_posix_ns(t):
if t == 0:
return 0
# NT has an epoch of 1601, and its time unit is 100 ns.
return (t - 116444736000000000) * 100
if (self._is_reparse_point() and
(follow_symlinks or not self._is_name_surrogate())):
return os.stat(self.path)
if self._info.ReparseTag == stat.IO_REPARSE_TAG_SYMLINK:
mode = stat.S_IFLNK
elif self._info.FileAttributes & stat.FILE_ATTRIBUTE_DIRECTORY:
mode = stat.S_IFDIR
else:
pipe_paths = ('\\\\.\\pipe', '\\\\?\\pipe')
drive = os.path.splitdrive(os.fsdecode(self._dirpath))[0]
if drive and os.path.normcase(drive) in pipe_paths:
mode = stat.S_IFIFO
else:
mode = stat.S_IFREG
file_id = getattr(self._info, 'FileId', 0)
atime_ns = nt_time_as_posix_ns(self._info.LastAccessTime)
mtime_ns = nt_time_as_posix_ns(self._info.LastWriteTime)
# BUGBUG: POSIX st_ctime should be the metadata change time, and
# st_btime should be the creation (birth) time. But Python
# follows the Windows C runtime implementation, which back in the
# days of MS-DOS in the 1980s, before there was even a POSIX
# standard, chose to redefine Unix st_ctime as the creation time.
# They should have added a new field for the creation time, and
# they should have ignored st_ctime until they had a filesystem
# that supported it, i.e. NTFS on Windows NT in 1993.
ctime_ns = nt_time_as_posix_ns(self._info.CreationTime)
btime_ns = nt_time_as_posix_ns(self._info.CreationTime)
change_time_ns = nt_time_as_posix_ns(self._info.ChangeTime)
return stat_result(
st_mode=mode,
st_ino=file_id,
st_dev=0,
st_nlink=0,
st_uid=0,
st_gid=0,
st_size=self._info.EndOfFile,
st_atime=atime_ns // 10**9,
st_mtime=mtime_ns // 10**9,
st_ctime=ctime_ns // 10**9,
st_btime=btime_ns // 10**9,
st_atime_ns=atime_ns,
st_mtime_ns=mtime_ns,
st_ctime_ns=ctime_ns,
st_btime_ns=btime_ns,
st_change_time=change_time_ns // 10**9,
st_change_time_ns=change_time_ns,
st_file_attributes = self._info.FileAttributes,
st_reparse_tag = self._info.ReparseTag)
def inode(self):
if (not hasattr(self._info, 'FileId') or
(self._is_reparse_point() and not self._is_name_surrogate())):
return os.stat(self.path).st_ino
return self._info.FileId
def is_dir(self, follow_symlinks=True):
if self._is_reparse_point():
if follow_symlinks or not self._is_name_surrogate():
return os.path.isdir(self.path)
if self._info.ReparseTag == stat.IO_REPARSE_TAG_SYMLINK:
return False
if self._info.FileAttributes & stat.FILE_ATTRIBUTE_DIRECTORY:
return True
return False
def is_file(self, follow_symlinks=True):
if self._is_reparse_point():
if follow_symlinks or not self._is_name_surrogate():
return os.path.isfile(self.path)
if self._info.ReparseTag == stat.IO_REPARSE_TAG_SYMLINK:
return False
if self._info.FileAttributes & stat.FILE_ATTRIBUTE_DIRECTORY:
return False
pipe_paths = ('\\\\.\\pipe', '\\\\?\\pipe')
drive = os.path.splitdrive(os.fsdecode(self._dirpath))[0]
if drive and os.path.normcase(drive) in pipe_paths:
return False
return True
def is_symlink(self):
return self._info.ReparseTag == stat.IO_REPARSE_TAG_SYMLINK
def is_junction(self):
return self._info.ReparseTag == stat.IO_REPARSE_TAG_MOUNT_POINT
def scandir(path=None):
"""Return an iterator of DirEntry objects for given path."""
if path is None:
path = os.getcwd()
def isdir():
info = FILE_BASIC_INFO()
if kernel32.GetFileInformationByHandleEx(
hFile, FileBasicInfo, ctypes.byref(info),
ctypes.sizeof(info)):
return info.FileAttributes & stat.FILE_ATTRIBUTE_DIRECTORY
return False
def readdir():
nonlocal info_class
if kernel32.GetFileInformationByHandleEx(
hFile, info_class, buf, ctypes.sizeof(buf)):
return True
error = ctypes.get_last_error()
if error == ERROR_NO_MORE_FILES:
return False
elif (info_class == FileIdBothDirectoryRestartInfo and
error in (ERROR_INVALID_FUNCTION,
ERROR_NOT_SUPPORTED,
ERROR_INVALID_PARAMETER)):
info_class = FileFullDirectoryRestartInfo
return readdir()
elif error == ERROR_MORE_DATA:
ctypes.resize(buf, ctypes.sizeof(buf) * 2)
return readdir()
raise ctypes.WinError(error)
def ScandirIterator():
try:
while True:
yield from DirEntry._listbuf(buf, info_class, dirpath)
if not readdir():
break
finally:
if close:
os.close(fd)
close = False
try:
if isinstance(path, int):
fd = path
hFile = msvcrt.get_osfhandle(fd)
if kernel32.GetFileType(hFile) != FILE_TYPE_DISK:
raise ValueError('if path is a file descriptor, it must '
'refer to a file on a volume device')
dirpath = ''
else:
path = os.fspath(path)
hFile = kernel32.CreateFileW(
os.fsdecode(path), FILE_READ_DATA, FILE_SHARE_READ,
None, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, None)
if hFile == INVALID_HANDLE_VALUE:
raise ctypes.WinError(ctypes.get_last_error())
try:
fd = msvcrt.open_osfhandle(hFile, os.O_RDONLY)
except:
kernel32.CloseHandle(hFile)
raise
close = True
dirpath = path
if not isdir():
raise ctypes.WinError(ERROR_DIRECTORY)
buf = (ctypes.c_char * 65536)()
info_class = FileIdBothDirectoryRestartInfo
readdir()
if info_class == FileIdBothDirectoryRestartInfo:
info_class = FileIdBothDirectoryInfo
elif info_class == FileFullDirectoryRestartInfo:
info_class = FileFullDirectoryInfo
except:
if close:
os.close(fd)
raise
return ScandirIterator()
def listdir(path=None):
"""Return a list containing the names of the files in the directory."""
return [e.name for e in scandir(path)]
I also had this problem multiple times. It usually, but not always, happens when there are a large amount of files (1k+) within a folder inside OneDrive. First of all, I am able to reproduce the problem of os.listdir()
now:
It's the same problem as @GordonAitchJay found: dir
in Windows cmd works well, but os.listdir()
fails.
@eryksun your functions work!!
I also have another related problem. That is, inside a big OneDrive folder, sometimes I get an error when starting python.
I don't know cpython at all but am happy to test if you need someone who can reproduce the error. Many thanks!
This is an issue experienced by a user on StackOverflow, so please excuse the lack of details and MRE. I'm hoping a Windows internals expert and/or a OneDrive dev can shed light on the situation.
Why does os.walk() (Python) ignore a OneDrive directory depending on the number of files in it?
The user has a directory which is a sync/shortcut of a SharePoint folder containing 897 files (all files can be opened, they are downloaded, not on-demand). When calling
os.listdir
with this directory, an exception is raised:OSError: [WinError 87] The parameter is incorrect:
. However, if 2 files are deleted, it returns all the files (besides the 2 which were deleted). If the directory is copied somewhere outside the purview of OneDrive,os.listdir
returns all 897 files.Calling
win32file.FindFilesW
behaves the same asos.listdir
. With 897 files it raises an exception:error: (87, 'FindNextFileW', 'The parameter is incorrect.')
. After deleting 2 files, it returns all the files.When calling
win32file.FindFilesIterator
when the directory has all 897 files, 443 files are yielded before the error occurs.glob.glob()
is the same but doesn't yield.
or..
(as expected). Strangely, if only 1 file is deleted,win32file.FindFilesIterator
yields only 25 files!If the directory is copied to the local OneDrive root directory,
os.listdir
initially works (when OneDrive had just started uploading the files). However, after a couple of minutes, once a number of the files have been uploaded,os.listdir
results inOSError: [WinError 87] The parameter is incorrect:
again. Even before all files have synced,win32file.FindFilesIterator
yields only 443 files again.Explorer always shows the full list of files, and so does cmd's
dir
, and powershell'sls
andgci
.Calling
NtQueryDirectoryFile
directly with ctypes always shows the full list of filesI'm fairly sceptical that CPython is at fault here, but I find it utterly bizarre that cmd's
dir
, and powershell'sls
andgci
work, which all callFindNextFileW
, yet when CPython calls the same function it predictably returns prematurely.