rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.99k stars 12.79k forks source link

Correcting Path::components on non-"Unix" platforms #52331

Open jackpot51 opened 6 years ago

jackpot51 commented 6 years ago

Path::components is incorrect on Redox. I would like to develop the solution here: https://github.com/rust-lang/rust/pull/51537.

The following is a description of the problem.

Suppose you have the following path:

file:/home/user/foo/bar.txt

You split the path into components using Path::components

Path::new("file:/home/user/foo/bar.txt").components().collect::<Vec<()>>()

In Linux, this would be equivalent to the following:

vec![
    Component::Normal("file:"),
    Component::Normal("home"),
    Component::Normal("user"),
    Component::Normal("foo"),
    Component::Normal("bar.txt"),
]

Joining the components with the current directory would give you a path such as this:

./file:/home/user/foo/bar.txt

In Redox, we want to be able to get from the original path to components back to the original path without any modifications. Here are examples of this usage of Path::components:

https://github.com/uutils/coreutils/search?q=components

In Redox, we have the following options for the file: component:

  1. Component::Normal("file:")
  2. Component::Normal("file:") followed by Component::RootDir
  3. Component::Prefix(Prefix::Verbatim("file:"))
  4. Component::Prefix(Prefix::Scheme("file:"))

Option 1

Component::Normal("file:")

The path mentioned above would end up being the following after being rebuilt from its components:

./file:/home/user/foo/bar.txt

This is the old way of doing things. It not only makes Path::components useless on Redox. Canonicalizing a path will always add a scheme like file: to the beginning, so it is likely that path handling will be incorrect. Absolute paths would always be interpreted as relative.

:x: This option is unusable for Redox.

Option 2

Component::Normal("file:") followed by Component::RootDir

This would preserve the original meaning of the path, such that it could be rebuilt from its components as follows:

file:/home/user/foo/bar.txt

However, this may require a large amount of work to handle, as it seems likely that code exists that only checks the first component for being Component::RootDir or Component::Prefix in order to identify an absolute path.

The documentation for Prefix provides one such example, which has likely inspired similar usage:

https://doc.rust-lang.org/std/path/enum.Prefix.html#examples

:x: This option would likely break the expectations of many consumers of the Prefix enum.

Option 3

Component::Prefix(Prefix::Verbatim("file:"))

This option preserves the original meaning of the path, a rebuilt path would be this:

file:/home/user/foo/bar.txt

This, however, is overloading a variant meant to be used on Windows, for a path looking like this:

\\?\home\user\foo\bar.txt

This means different code will be needed when running on Redox to correctly parse paths to components and turn components into paths.

:heavy_check_mark: This does leave the enum untouched, while allowing for the correct parsing of paths on Redox. The only downside is a possible lack of clarity due to overloading the meaning of the Prefix::Verbatim variant.

Option 4

Component::Prefix(Prefix::Scheme("file:"))

This option also preserves the original meaning of the path, a rebuilt path would be this:

file:/home/user/foo/bar.txt

This is the most clear option, having separate code to handle specifically the Redox scheme abstraction.

This has the downside of changing a stable enum, Prefix. There is, however, the possibility of having the extra enum variant be #[cfg(target_os = "redox")], so as to preserve the Prefix enum on other platforms.

:heavy_check_mark: This option could be used to have correct path parsing without affecting the stability of the Prefix enum on non-Redox platforms, if a cfg attribute is used.

Conclusion

Potentially the Prefix enum would be done different after a major version bump, perhaps using extension traits that are platform specific. I would think that there would be an opaque Prefix struct, and something like .into_os_prefix() would provide you with an os-specific enum to match against.

For the time being, options 3 and 4 seem to be possible, with some caveats, and would allow code using stable Rust to quickly do-the-right-thing on Redox.

kennytm commented 6 years ago

I would think that there would be an opaque Prefix struct, and something like .into_os_prefix() would provide you with an os-specific enum to match against.

The Prefix variant of the Component enum contains the opaque struct PrefixComponent, not the enum Prefix.

This means a backward-compatible option 5 is available:

impl<'a> PrefixComponent<'a> {
    /// On Windows, returns the parsed prefix data.
    ///
    /// On other platforms, the returned value is unspecified.
    #[cfg_attr(
        target_os = "redox",
        rustc_deprecated(
            since = "1.29.0", 
            reason = "The prefix component is meaningless on Redox. Use `.scheme()` instead",
        ),
    )]
    pub fn kind(&self) -> Prefix<'a> { ... }
}

mod redox {
    pub trait PrefixComponentExt<'a> {
        // Obtains the scheme of the Redox path (URL).
        fn scheme(&self) -> &'a OsStr;
    }

    impl<'a> PrefixComponentExt<'a> for PrefixComponent<'a> {
        fn scheme(&self) -> &'a OsStr { self.as_os_str() }
    }
}
jackpot51 commented 5 years ago

What should the return value of PrefixComponent::kind be on Redox?

4lDO2 commented 4 years ago

Regarding option 5, shouldn't kind() ideally be deprecated on all platforms, with an extension trait for Windows instead? Even though there's probably a lot of software depending on the kind method, isn't it more optimal for it to be platform specific like everything else that's filesystem related.std::os::fs::symlink for example, which arguably is one of the more common unix syscalls, was made platform specific while the older soft_link function was deprecated.

4lDO2 commented 4 years ago

@jackpot51 Maybe DeviceNS? Not that there is any enum variant that fits the purpose correctly at all on Redox, but based on my limited experience with file systems on Windows, AFAIK windows uses "devices" (IoCreateDevice) for filesystems, which would probably apply more than the alternatives, for a Redox scheme.

That said, the return value would not make any sense at all, but at least it would make parsing and serialization to and from Prefix consistent, even though it won't be the same format as real schemes.

ian-h-chamberlain commented 2 years ago

@rustbot label +I-needs-decision

I hope this is the right way to ask for some input from the libs-api team. This issue has stagnated for a while without a clear path forward, and it would be nice if we had a way to support multiple platforms that use a path prefix scheme like this other than just Windows.

joshtriplett commented 2 years ago

One proposed design:

joshtriplett commented 2 years ago

Also, we're going to need someone willing to do the design and implementation work here.

jackpot51 commented 2 years ago

@joshtriplett I will implement whichever design has the highest chance of being upstreamed.

jackpot51 commented 2 years ago

@joshtriplett can you please help to get upstream consensus on which option from the original post would be best and I will implement it?

joshtriplett commented 2 years ago

@jackpot51 I don't think attempting to use the existing Prefix will work; we can't add new variants, and none of the existing variants really fit (and will break expectations of software written for Windows).

Given that, that leaves us with:

Windows folks: should we pay the transition cost to allow for new path prefixes, or is that exceedingly unlikely given that the existing prefixes include extensible variants? @rustbot ping windows

@rust-lang/libs-api: Does the above plan sound reasonable? @rfcbot merge

(This is orthogonal to questions about how we should handle non-native path types in the future.)

rfcbot commented 2 years ago

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rustbot commented 2 years ago

Hey Windows Group! This bug has been identified as a good "Windows candidate". In case it's useful, here are some instructions for tackling these sorts of bugs. Maybe take a look? Thanks! <3

cc @arlosi @ChrisDenton @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @nico-abram @retep998 @rylev @sivadeilra @wesleywiser

jackpot51 commented 2 years ago

@jackpot51 I don't think attempting to use the existing Prefix will work; we can't add new variants, and none of the existing variants really fit (and will break expectations of software written for Windows).

Given that, that leaves us with:

* Make `kind` panic on non-Windows platforms.

* Add a `redox_kind` method to `PrefixComponent`, returning `Option<RedoxPrefix>`, which should be a non_exhaustive enum.

* Add a `windows_kind` method as well. That should either return `Option<WindowsPrefix>` as another non_exhaustive enum, or `Option<Prefix>` for compatibility with the existing type if we're more confident we'll never want another variant on Windows. (Advice from the Windows team would be helpful here. I would _guess_ we want the latter, since the existing prefixes include extensible variants.)

* On some timeline, mark `kind` as `deprecated` or `deprecated_in_future`.

Windows folks: should we pay the transition cost to allow for new path prefixes, or is that exceedingly unlikely given that the existing prefixes include extensible variants? @rustbot ping windows

@rust-lang/libs-api: Does the above plan sound reasonable? @rfcbot merge

(This is orthogonal to questions about how we should handle non-native path types in the future.)

If deprecating kind is on the table, maybe move this to an os-specific trait like PathExt?

ian-h-chamberlain commented 2 years ago

If deprecating kind is on the table, maybe move this to an os-specific trait like PathExt?

I would like to express support for this as an option, because I think it's conceivable there would be use cases for other OSes expanding on the API in the future (for example, the 3DS horizon OS is another place where we'd like something similar). It doesn't seem ideal in my opinion to add a new {os}_kind() method for every new OS that wants something like this, but an extension trait could allow specific targets to add such functionality as needed?

jackpot51 commented 11 months ago

I will be trying to solve this using the recommendations above. It has been a while, but we are still running into issues with path parsing on Redox and solving this issue would fix most of them.

jackpot51 commented 10 months ago

We've decided to provide, and use by default, a path format that works without needing prefixes. scheme_name:path/to/file can now be represented as /scheme/scheme_name/path/to/file.

kennytm commented 10 months ago

apparently even with Redox switching to the Unix path format, this is still a problem for 3DS Horizon OS (https://github.com/rust-lang/rust/issues/52331#issuecomment-1207309324)?

jackpot51 commented 10 months ago

That is correct.

Ayush1325 commented 10 months ago

This is also a problem for UEFI.

Ayush1325 commented 1 month ago

Oh, I was working on creating a new issue, but I guess since this has been reopened, I can put my proposal here.

UEFI Prefix Types

Shell Prefix

If UEFI shell is present, it creates mappings for all partitions/devices that implement SIMPLE_FILESYSTEM_PROTOCOL. It follows the following naming <MTD><HI><CSD> (eg: FS0:).

Device Path

EFI_DEVICE_PATH_PROTOCOL is the normal UEFI way to represent devicepaths. It also has unique conversion to and from UTF-16 strings. The text representation of path looks as follows: PciRoot(0x0)/Pci(0x1,0x1)/Ata(Secondary,Slave,0x0)/\run.efi

Proposal

I wanted to propose a 2-step plan to improve the handling of such paths, and creating the respective PRs.

Step 1

Create a private GenericPrefix opaque struct. This will be used to replace all the private APIs that currently rely on Prefix such as determining absolute/relative paths, etc. Initially, I was planning to use PrefixComponent for this. However, since PartialEq, Ord and Hash implementations on PrefixComponents are stable and rely on Prefix internally, any changes to that will probably cause problems.

This step will allow isolating Prefix and PrefixComponent to the std surface. Additionally, it will also allow most path operations to work for platforms like UEFI with no API-breaking changes.

Step 2

Start adding new public APIs for new prefix capabilities and deprecating Prefix. This is much more delicate the Step 1 and will require discussion when the time comes. Hopefully, step 1 will make the design of any such API much clearer.

Does this seem good? Or maybe there is a better way to go about it?

m-ou-se commented 1 week ago

@Ayush1325 We discussed this in a recent libs-api meeting. We're not sure what the right path forward is with Path::components(), but we're willing to try out your proposal as an unstable feature to see where we end up. You're welcome to open a tracking issue for it and send PRs to make your changes. (As long as they don't affect stable Rust, of course.)

Separately from that, I have some thoughts about another solution, which I will post in a moment.

m-ou-se commented 1 week ago

I have been wondering for quite a while if a (better) solution could be to fully deprecate Path::components(). A year or two ago, I did a search through GitHub Code Search for all uses of Path::components() in the wild, and noticed that (at the time), effectively 100% of the usages were all ad-hoc (sometimes buggy) implementations of common operations like finding the common prefix of two paths, iterating over parent directories (to find e.g. a Cargo.toml file), path canonicalization without file system access (removing .. and . components), etc.

It gave me the impression that by adding a few of those methods to Path (which we should probably do regardless), there might not be much of a reason to keep maintaining Path::components(), allowing us to deprecate it, and/or to worry less about its behaviour on new platforms.

(After all, if you think about it, Path::components() is quite a weird API. It's an iterator that produces elements of which a certain variant can only ever appear as the first item, even though the signature/types would allow for something invalid like a Prefix to appear in the middle. If designed from scratch today, I wonder if we'd go for (Option<Prefix>, Iterator<Target=&str>) rather than Iterator<Target=Component>, which seems more fitting to me.)

Ayush1325 commented 1 week ago

@m-ou-se I agree, it should be deprecated. The problem is that Path internally also uses Component internally to check absolute paths, which in turn affects how join and other path manipulation APIs work. So those also need to be updated in a way that does not depend on Component but does not break the current behavior.

retep998 commented 1 week ago

A year or two ago, I did a search through GitHub Code Search for all uses of Path::components() in the wild, and noticed that (at the time), effectively 100% of the usages were all ad-hoc (sometimes buggy) implementations of common operations like finding the common prefix of two paths, iterating over parent directories (to find e.g. a Cargo.toml file), path canonicalization without file system access (removing .. and . components), etc.

We absolutely should come up with a solid list of these usages, and figure out how to design a path API that is designed to actually handle these use cases solidly, instead of just providing a bunch of lower level access and expecting people to solve basic problems every time.