python / cpython

The Python programming language
https://www.python.org
Other
63.08k stars 30.21k forks source link

test_os: test_attributes failed on ReFS, Windows 11 Dev Drive #111293

Closed zcxsythenew closed 11 months ago

zcxsythenew commented 12 months ago

Bug report

Bug description:

PowerShell command:

PS D:\cpython> .\PCbuild\build.bat -t CleanAll
PS D:\cpython> .\PCbuild\build.bat -e -d
PS D:\cpython> .\python.bat -m test -j1 

Test output:

0:26:00 load avg: 0.52 [285/469/2] test_os failed (1 failure)
test test_os failed -- Traceback (most recent call last):
  File "D:\cpython\Lib\test\test_os.py", line 4696, in test_attributes
    self.check_entry(entry, 'dir', True, False, False)
  File "D:\cpython\Lib\test\test_os.py", line 4645, in check_entry
    self.assertEqual(entry.inode(),
AssertionError: 0 != 887860239011714428829696

The output indicates that test_attributes failed.

Given that os.link and os.symlink can both success, I summarized the test below (may be slightly different from the original), shown in Python code:

import os

def create_file(file_name, content=b'content'):
    with open(file_name, 'xb', 0) as fp:
        fp.write(content)
    return file_name

if __name__ == '__main__':
    dirname = 'D:\\test-py'
    os.mkdir(dirname)
    filename = create_file('D:\\file.txt')

    os.link(filename, 'D:\\link_file.txt')
    os.symlink(dirname, 'D:\\symlink_dir', True)
    os.symlink(filename, 'D:\\symlink_file.txt')

    for entry in os.scandir('D:\\'):
        print(entry.name)
        print(entry.inode())
        print(os.stat(entry.path, follow_symlinks=False).st_ino)
        print('')

Output:

$RECYCLE.BIN
0
121914531583146426630144

aspnetcore
0
37170189308524746506240

cpython
0
461777344397171205603328

Euler
0
459231693714999287480320

file.txt
18
28334198897217871282194

gobook
0
458438483719829776760832

gobook2
0
458752078369082839138304

gopl.io
0
455523898156183667605504

leveldb
0
445249061707127447355392

link_file.txt
18
28334198897217871282194

oceanbase
0
413631342364789275885568

source
0
121951425071293845733376

symlink_dir
0
1057588731233916013248512

symlink_file.txt
19
28334198897217871282195

System Volume Information
0
0

test-py
0
1057570284489842303696896

The above output indicates that:

Because they are different, the test fails.

Note: This issue will not appear on NTFS. This issue can only be reproduced on ReFS.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows

Linked PRs

zcxsythenew commented 12 months ago

It may be clearer to use hex instead of decimal:

D:\source\python-sample\venv\Scripts\python.exe D:\source\python-sample\main.py 
$RECYCLE.BIN
0x0
0x19d10000000000000000

aspnetcore
0x0
0x7df0000000000000000

cpython
0x0
0x61c90000000000000000

Euler
0x0
0x613f0000000000000000

file.txt
0x12
0x6000000000000000012

gobook
0x0
0x61140000000000000000

gobook2
0x0
0x61250000000000000000

gopl.io
0x0
0x60760000000000000000

leveldb
0x0
0x5e490000000000000000

link_file.txt
0x12
0x6000000000000000012

oceanbase
0x0
0x57970000000000000000

source
0x0
0x19d30000000000000000

symlink_dir
0x0
0xdff30000000000000000

symlink_file.txt
0x13
0x6000000000000000012

System Volume Information
0x0
0x0

test-py
0x0
0xdff30000000000000000

It seems that entry.inode() drops the highest bits and only returns lower bits.

There is a comment on Microsoft website about file id and ReFS:

https://learn.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information

zcxsythenew commented 12 months ago

The implementation of entry.inode() is incorrect. It drops stat.st_ino_high and only includes stat.st_ino. See posixmodule.c:15136:

static PyObject *
os_DirEntry_inode_impl(DirEntry *self)
/*[clinic end generated code: output=156bb3a72162440e input=3ee7b872ae8649f0]*/
{
#ifdef MS_WINDOWS
    if (!self->got_file_index) {
        ...

        self->win32_file_index = stat.st_ino;
        self->got_file_index = 1;
    }
    static_assert(sizeof(unsigned long long) >= sizeof(self->win32_file_index),
                  "DirEntry.win32_file_index is larger than unsigned long long");
    return PyLong_FromUnsignedLongLong(self->win32_file_index);
#else /* POSIX */
    ...
#endif
}
gvanrossum commented 12 months ago

@zooba What is ReFS?

zcxsythenew commented 12 months ago

@zooba What is ReFS?

ReFS uses 128-bit file ids, while NTFS does not. entry.inode() drops the higher 64 bits, and it is the issue.

zooba commented 12 months ago

@gvanrossum It's a file system. Been around for 10+ years at this point, but only recently became available to non-server installs (and it's the preferred file system for Dev Drive, so we can expect usage to increase).

zcxsythenew commented 12 months ago

I think besides #111294 , it may be possible to add another environment to the Test workflow, maybe named "Windows (x64) with ReFS". Is that possible?

I also notice that another test case fails on ReFS and I'm considering whether to raise a new issue.

zooba commented 12 months ago

We normally set up buildbots for specific scenarios like this. GitHub already has too many checks to be efficient for PRs, and we don't need to validate every possible configuration pre-commit.

Please do raise a new issue. We want to get things right on ReFS (but also be aware that some tests may have an implicit assumption of NTFS that's at fault (and also also be aware that newer Windows Insider builds have the newer stat() API implementation, which may have otherwise undiscovered issues, so your precise build number may be relevant)).

zcxsythenew commented 12 months ago

OK, I will raise it tomorrow. The name of the test case is like "send_file" (I don't quite remember it now) and it may be related to I/O model.