Open bbilly1 opened 1 week ago
How are you defining "file extension"?
Common denominator I see is: "The file extension defines what kind of file it is."
Some definitions I have found:
Common denominator I see is: "The file extension defines what kind of file it is."
That's only true on Windows. On other operating systems, file extensions are an indicator and nothing more. I can rename an .mp3
file to .jpg
on Linux and play it just fine.
Microsoft's definition ("three- or four-character extension") excludes some valid extensions like .a
, .so
and .patch
, but fails to exclude file extensions with spaces such as . Smith resume for review
The point I'm getting at is that there is no standard definition of a file extension!
Since #82805 was solved, pathlib's suffix splitting works exactly like os.path.splitext()
. A non-empty suffix starts with a dot and contains at most one dot, and a non-empty suffix must be preceded by a stem that contains at least one non-dot character.
I get what you are saying. But even on Linux, that will depend on the implementation. E.g. xdg-open
will happily ignore the file extension. But common GUI file browsers will not (tested on Thunar).
As there is no standard definition, I'd suggest to either define it, or avoid using the term all together and use the implementation as definition, e.g. last segment separated by a dot.
Even though it's not authoritatively defined, I'd argue, the above with_suffix
example is unexpected behavior.
Microsoft's definition ("three- or four-character extension") excludes some valid extensions like
.a
,.so
and.patch
, but fails to exclude file extensions with spaces such as. Smith resume for review
The Windows shell API currently supports permanently associating a programmatic identifier (ProgID) with any file extension that does not include white space characters and that has a length from 1 to 198 characters (not including the dot). Thus the API supports ".a", ".so", and ".patch" as normal file extensions. If a file has no extension, or if the extension is longer than 198 characters or contains white space characters, then the API displays an open-with dialog that allows opening the file with an application just once instead of setting a permanent association.
Hum, that does give some legitimacy to the idea of forbidding whitespace in file extensions in pathlib.
I think ideal behavior would be, that if you are trying to create a suffix with whitespace, throw an error. If you are trying to access a suffix, if there is whitespace, it's not a suffix but just a regular part of the filename.
Bug report
Bug description:
According to the docs here a suffix is defined as:
But pathlib doesn't behave as expected, illustrating that on Python 3.12.4:
This is particularly problematic for methods like
with_suffix
. According to the docs here that should:But as established above, that results in:
I've characterized that as a bug as that is not working as described, but could also be a matter of documentation improvement.
I see a few ways:
add_suffix
, or an argument towith_suffix
, but I know that has been discussed before and was ultimately decided against. Also this is a matter of how suffix is defined, not how it's processed.CPython versions tested on:
3.12
Operating systems tested on:
Linux, Windows