python / cpython

The Python programming language
https://www.python.org
Other
63.38k stars 30.35k forks source link

`os.path.getsize` very slow for Windows 11 #124126

Open DanielYang59 opened 1 month ago

DanielYang59 commented 1 month ago

Bug report

Bug description:

Summary

I noticed os.path.getsize runs much slower (38x) on Windows 11 than Ubuntu 22.04-WSL2 (Windows 11 and Ubuntu 22.04-WSL2 running on the same physical machine, the same SSD, both tested while idle) and MacOS Sonoma 14.6.1.

Test code

import timeit
import platform

with open("test_file.txt", mode="w", encoding="utf-8", newline="") as file:
    for i in range(10):
        file.write(f"This is line {str(i)}\\n.")

execution_time = timeit.timeit(stmt='os.path.getsize("test_file.txt")', number=1_000_000, setup="import os")

os_info = platform.system()
kernel_info = platform.release()
python_version = platform.python_version()

print(f"Execution time: {execution_time:.6f} seconds")
print(f"Operating System: {os_info}")
print(f"Kernel Version: {kernel_info}")
print(f"Python Version: {python_version}")

Test results

On Windows 11 (Version: 23H2, OS build: 22631.4169):

Execution time: 30.922192 seconds
Operating System: Windows
Kernel Version: 11
Python Version: 3.12.5

Windows 11 (dev drive):

Execution time: 17.214313 seconds
Operating System: Windows
Kernel Version: 11
Python Version: 3.12.5

On Ubuntu 22.04 WSL2:

Execution time: 0.844529 seconds
Operating System: Linux
Kernel Version: 5.15.153.1-microsoft-standard-WSL2
Python Version: 3.12.5

On MacOS 14.6:

Execution time: 0.811347 seconds
Operating System: Darwin
Kernel Version: 23.6.0
Python Version: 3.12.5

CPython versions tested on:

3.12

Operating systems tested on:

Windows

picnixz commented 1 month ago

Can you test with the same Python versions please? Also, could you avoid putting a loop inside the statement to test and rather use number=10000 for instance?

DanielYang59 commented 1 month ago

Thanks for the quick response @picnixz , I just updated the test results with Python 3.12.5 exactly :)

Also, could you avoid putting a loop inside the statement to test and rather use number=10000 for instance?

That was deliberate (I want to rule out the import time of os), is there any pitfall for having a loop inside the test statement?

picnixz commented 1 month ago

want to rule out the import time of os

You can rule out the import time by using setup='import os'.

is there any pitfall for having a loop inside the test statement

Generally, no, but I'm not sure whether the garbage collector could do something inbetween, hence the question.


From my experience, Windows is generally slower when doing OS-related operations, so I'm not that shocked. Let's ask an expert on this topic: @zooba

picnixz commented 1 month ago

I just observed that os.path.getsize is simply calling os.stat and then gets its corresponding field. So the problem (if any) is the slowness of os.stat.

DanielYang59 commented 1 month ago

You can rule out the import time by using setup='import os'.

Thanks a lot for the input, both script and result updated.

From my experience, Windows is generally slower when doing OS-related operations, so I'm not that shocked.

I thought Windows is just "slightly" slower, but pretty surprised to see such a big gap.

Also perhaps calling getsize a million times is a rare use case, in my case I was just trying to create a test file of specific size, and find the following code taking forever on my Windows machine:

with open(file_path, "w", encoding="utf-8", newline="") as f:
    while os.path.getsize(file_path) < target_size:
        f.write(f"This is line number {line_number}\n")

In my case, it's much faster to guess a total number of lines and avoid using getsize after writing each line :)

picnixz commented 1 month ago

I'd be interested in knowing whether this is really os.stat that is 30x slower on Windows or not. For your specific use case, create a bytes object of the number of bytes you want, fill it with whatever you want and write it in binary mode and you should have a file of exact size.

DanielYang59 commented 1 month ago

For your specific use case, create a bytes object of the number of bytes you want

Yep, solid idea! I assume it would be much faster to check the size of the bytes object than the file :)

zooba commented 1 month ago

Please try running on a Dev Drive to compare. It's not quite free of the issues that make Windows have slower I/O than Linux, but it's significantly better than using your default OS drive.

I'd also be interested to know exactly which build of Windows you're running. One recent update includes a new API for getting file metadata that is implemented more like Linux (it doesn't require opening the file first, which Windows traditionally does). Python 3.12 should use the new API automatically, and some measurements have shown that it runs 3-4x faster than the old one.

But overall, the slow file system is an OS issue, probably not a Python issue. To see a Python issue, you'll need to do native profiling of Python itself and show that we're somehow going through significantly more of our own code on one OS than another. Simple timings of OS operations are not really comparable in that way.

DanielYang59 commented 1 month ago

Hi @zooba thanks a lot for the input and detailed explanation!

Please try running on a Dev Drive to compare.

It's indeed much faster (17 sec vs 31 sec).

I'd also be interested to know exactly which build of Windows you're running.

It should be Version: 23H2, OS build: 22631.4169.

But overall, the slow file system is an OS issue, probably not a Python issue. To see a Python issue, you'll need to do native profiling of Python itself and show that we're somehow going through significantly more of our own code on one OS than another. Simple timings of OS operations are not really comparable in that way.

Fully understandable, thanks a lot for the input. But as an end user I don't quite know how to properly profiling Python, so opened this in case you can do something :)

zooba commented 1 month ago

But as an end user I don't quite know how to properly profiling Python, so opened this in case you can do something :)

At least on Windows, the approach is to use Windows Performance Recorder to capture a trace and then Windows Performance Analyzer to attribute the CPU time to either one of Python's native modules (you won't get Python-specific information in there yet, but I'll be releasing a tool soon to help with that) or an OS module.

It's quite a specialized job, I'll be honest! But there are people out there who know how to do it, and may also have the time and interest to see what's up (not me, right now).

It should be Version: 23H2, OS build: 22631.4169.

This doesn't have the new API in it, so you're getting the Dev Drive accelerated time, but not the improved stat calls. I believe Insider builds should have it already.

zooba commented 1 month ago

For reference, I just did python3.12 -m timeit -n 100 -s "import os" "sum(os.path.getsize(s) for s in os.scandir(r'C:\Windows\System32'))" on a 22631 build and an unreleased 26100 build (both with Store install of 3.12.6) and got 178ms vs 57.8ms. So the new API should provide 2-3x speedup on this operation, and that should stack on top of the Dev Drive benefit (though I suspect part of the benefit is from bypassing the same drivers that Dev Drives disable, so it may not be a straight (1.5-2x) x (2-3x) = (3-6x) calculation).