python / cpython

The Python programming language
https://www.python.org
Other
63.13k stars 30.22k forks source link

Improve accuracy of `ntpath.normpath()` & `ntpath.abspath()` #119826

Open nineteendo opened 4 months ago

nineteendo commented 4 months ago

Feature or enhancement

Proposal:

The accuracy of ntpath.normpath() & ntpath.abspath() can be improved:

>>> import ntpath
>>> ntpath.normpath('C:.')
'C:.' # instead of 'C:'
>>> ntpath.abspath('C:\x00')
'C:\x00' # instead of 'C:\\Users\\wanne\\\x00'
>>> ntpath.abspath('./con')
'\\\\.\\con' # instead of 'C:\\Users\\wanne\\con'
>>> ntpath.abspath('./C:spam')
'C:\\Users\\wanne\\spam' # instead of 'C:\\Users\\wanne\\C:spam'

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

Linked PRs

nineteendo commented 4 months ago

See current changes: https://github.com/python/cpython/compare/main...nineteendo:cpython:fix-ntpath.abspath?expand=1

nineteendo commented 4 months ago

cc @barneygale, @eryksun

eryksun commented 4 months ago

I'm mostly concerned with fixing abspath() on Windows, by (1) using a private _path_normpath_ex() function that supports preserving a leading "." component, and (2) fixing the fallback implementation to correctly support drive-relative paths.

The case of normpath('C:.') is a bug in the C implementation that can be fixed. The pure Python implementation of ntpath.normpath() returns the correct result, "C:".

Note that no changes are required for the documented behavior and call signature of normpath() itself.

@barneygale, pathlib.PureWindowsPath was changed in 3.12+ to preserve an explicit leading "." in the case of relative paths that are ambiguous with drive-relative paths, such as ".\C:spam", but not generally for relative paths, such as ".\con". Would it possible and reasonable to make pathlib.PureWindowsPath always preserve an explicit initial ".", or maybe if there's only one subsequent component?

barneygale commented 4 months ago

Would it possible and reasonable to make pathlib.PureWindowsPath always preserve an explicit initial ".", or maybe if there's only one subsequent component?

I think this is too likely to break users code if Path('foo') and Path('./foo') no longer hash/compare equal. There may be cases where users are relying on pathlib to remove that leading ./.

The dropping of leading ./ and trailing / is called out in the pathlib docs from 3.13: https://docs.python.org/3.13/library/pathlib.html#comparison-to-the-os-and-os-path-modules

I wish I could fix it :( but I can't see a route that won't cause unreasonable breakage. I wish we'd caught this while pathlib was still provisional.

nineteendo commented 4 months ago

I split up the pull request to make it easier to review. Feel free to take a look if you have time.