swiftlang / swift-corelibs-foundation

The Foundation Project, providing core utilities, internationalization, and OS independence
swift.org
Apache License 2.0
5.29k stars 1.13k forks source link

File path support for Foundation #5094

Open cmcgee1024 opened 1 month ago

cmcgee1024 commented 1 month ago

It's common for languages to have as part of their standard library a way to construct file path data structures that can then be used to perform certain queries (e.g. isAbsolute(). fileExt(), baseName(), dirName()) on them, and perform certain path arithmetic operations, such as appending paths and/or files to the end, or resolving relative paths against absolute ones. With some care an app writer can craft something that runs without much modification on a variety of platforms, including Linux, macOS, and Windows.

For example, Python has pathlib, Go has filepath, and Java has Path. Some of these take different stances on certain issues like being able to use paths from a platform other than the current host platform. But, for the most part these are the cornerstone of many libraries, and apps written in those languages. It also helps the standard app to be more platform independent, again, with some care from developers.

In the Swift world there are at least these independent file path API's that are available to developer:

On top of this, various Swift libraries and apps are writing their own Path structs, wrappers, and extensions to these different API's, which also perpetuates this even more because of mismatches between libraries. One uses Foundation URI, another uses FilePath, String, or some custom Path. A number of projects are hitting problems when trying to port things to Windows. Also, they are re-learning from some of the mistakes that a common API could shield them.

This enhancement would help to unify much of Swift under a single file path API for the benefit of everyone. Hopefully, porting efforts will become easier, same with integration.

cmcgee1024 commented 1 month ago

FilePath from System is arguably very close to what's needed except it's in the wrong module, System, not Foundation where the platform neutral functions can live. As mentioned in the description there are some dependencies on POSIX Errnos that are throwable. Also, some Windows-specific issues with encodings / codable.

Perhaps this can serve as the starting point for a common API?

parkera commented 1 month ago

Thanks for the detailed bug report @cmcgee1024

milseman commented 1 month ago

It's common for languages to have as part of their standard library a way to construct file path data structures that can then be used to perform certain queries (e.g. isAbsolute(). fileExt(), baseName(), dirName()) on them, and perform certain path arithmetic operations, such as appending paths and/or files to the end, or resolving relative paths against absolute ones.

This is what System's FilePath is for. It's API for syntactic manipulation was designed to be a cross-platform Swifty superset of what is found in Python, C#, Rust, C++, etc.

FilePath has a ComponentView which is a RangeReplaceableCollection of path components that provides algebraic semantics for manipulating the components of a path. FilePath formally separates the root from this algebraic collection, which is necessary for sensible inserts, etc.

See https://gist.github.com/milseman/294bd494d6911c65b80fccff5873b295, which includes rationale and can be an easier way to view the API in whole than browsing documentation.

Tied to the current platform, which may or may not be POSIX

FilePath represents a native path for the target, so it is a Windows path when targeting Windows and a Unix path when targeting Unix. There is a place for such a type, e.g. an argument to a syscall would take a native path.

We have the implementation machinery to support cross-platform paths, so it's just an API design question. I think it would be good to add explicit UnixFilePath and WindowsFilePath types with failable conversion to/from the target platform's FilePath.

Difficult to code in a platform-neutral way (see the Errnos)

FilePath's API is syntactic, meaning there's no syscalls and no Errnos. The operations you mentioned in the intro paragraph are syntactic operations and these behave in a platform-neutral way.

A separate question is what kinds of API we should have for interacting with the file system, and what that would look like for a low-level platform-specific layer and a higher level platform-agnostic layer.

Issues around encoding

What are your issues?

Interpreting the content of a file path can be file system specific. Windows paths are UCS2 and allows unpaired surrogates. Linux are bag-of-bytes, and Darwin is typically UTF-8 (often canonicalized in NFD)

FilePath on Linux/Darwin is a nul-terminated bag of UInt8 and on Windows is a nul-termianted bag of UInt16. When converting to a String (e.g. for printing out), it will replace invalid encoded contents with U+FFFD and this is defined in the documentation: https://developer.apple.com/documentation/system/filepath/description Similarly, String(decoding: FilePath) will perform the error correction (just like String(decoding: bytes, as: UTF8.self) performs the error correction).

That is, FilePath will not enforce modern Unicode on the path, it only does that when converting to a Unicode String, which is explicitly failable (via the validating: initializer) or error-correcting (via the decoding: initializer). This is how String's initializers over arbitrary data work.

Also, they are re-learning from some of the mistakes that a common API could shield them... FilePath from System is arguably very close to what's needed except it's in the wrong module, System, not Foundation where the platform neutral functions can live.

FilePath is that common API that was very carefully designed to avoid these mistakes and take the best of all the languages surveyed.

We can talk about having Foundation re-export FilePath for its syntactic API (whether or not it re-exports syscalls in general, though note that it already re-exports Darwin/GlibC).

milseman commented 1 month ago

IMO the ideal place for FilePath's (and Windows/Unix variants) syntactic operations would be the stdlib. It's within the stdlib's mandate and would make it the common API. (Also, no need for Foundation to re-export it and no need to pull in all of System just for it).

al45tair commented 1 month ago

Issues around encoding

I think this might be a reference to the Codable conformance, which is honestly a bit of a disaster area. (The System.FilePath type cannot safely be Codable, because the encoding differs on a per-platform basis.)

Putative WindowsFilePath and POSIXFilePath types, of course, could be.