lucasmeijer / NiceIO

For when you've had to use System.IO one time too many
MIT License
128 stars 19 forks source link

You can't safely collapse /foo/../bar to /bar without inspecting the filesystem #1

Open lasse-unity3d opened 9 years ago

lasse-unity3d commented 9 years ago

Symlinks might to directories might render the collapsed path different from the original one.

Consider:

lasse@deviance:/tmp/test$ find . | xargs ls -ld
drwxr-xr-x  5 lasse  wheel  170 Mar 30 14:04 .
lrwxr-xr-x  1 lasse  wheel   10 Mar 30 14:04 ./dir -> else/where
drwxr-xr-x  4 lasse  wheel  136 Mar 30 14:05 ./else
-rw-r--r--  1 lasse  wheel    0 Mar 30 14:05 ./else/file
drwxr-xr-x  2 lasse  wheel   68 Mar 30 14:03 ./else/where
-rw-r--r--  1 lasse  wheel    0 Mar 30 14:01 ./file
lasse@deviance:/tmp/test$ ls -li dir/../file
33273279 -rw-r--r--  1 lasse  wheel  0 Mar 30 14:05 dir/../file
lasse@deviance:/tmp/test$ ls -li file
33273240 -rw-r--r--  1 lasse  wheel  0 Mar 30 14:01 file
lasse@deviance:/tmp/test$ 
lucasmeijer commented 9 years ago

hey lasse, interesting point. I'll chew on that for a bit, I'm currently leaning towards "I dont care about that scenario", but then again I must say I also don't strongly care about having to collapse at all cost.

shana commented 9 years ago

I don't think you can get by ignoring the effects of symlinks in resolving paths if you really want a cross platform wrapper :wink:

lucasmeijer commented 9 years ago

what do you think this should return in lasse's case?

new Path("/tmp/test/dir").Parent();
shana commented 9 years ago

In case of a Parent() call (which is - navigate to the parent of this directory), there's no symlink resolving at all, you're just building paths. The command line would give you /tmp/test in that case, since navigation never resolves symlinks.

It's the case where you actively list contents from a built path that gets more complicated. There are valid use cases for building trees of symlinks that have different parents but are aggregated in a common root, and it's very handy to be able to get at the real parents of the symlinks (and not the fake one where you're aggregating them). Imagine fetching/updating data from different subdirectories and running scripts from the parents of those subdirectories (which are actually in different places and update in different ways depending on where they are).

Path building/navigation without resolving symlinks is fine, that's what the command line does, it's the most common use case and the one that makes most sense most of the time. If the path is built by using Parent() calls, then the path resulting from that navigation isn't resolved while navigating, only later.

The command line resolves things differently depending on whether you're navigating or listing contents. Navigation never resolves symlinks, listing (or acting upon a result of navigation, as it were) always resolves. Since you don't know what the user is going to do with the path they've built (if they do new Path("/tmp/test/dir/.."), are they navigating or do they need to list files there?), it's hard to know whether to resolve the path or not without a clue from the user (or a conscious choice by the API that's clearly documented).

If the path being listed is a string with .. in it, that's where the API should make a conscious choice (and say so) about how it resolves. If you collapse and then resolve (the equivalent of doing cd test/dir/.., which returns test), the user will never be able to access the real parents of symlinks along the way. If you resolve while collapsing (the equivalent of doing ls test/dir/... which returns test/else), you might confuse the hell out of the user. You could choose never to resolve before collapsing, but give the user a choice to resolve the path so they don't have that limitation if they need it (either by a flag or a Resolve() call, perhaps?)

I really don't care that much about how the API deals with symlinks, but they do exist and scripts/OS calls (and people's assumptions) will break if you pass symlinked paths at them, so ignoring them is hard :stuck_out_tongue:

lasse-unity3d commented 9 years ago

Agree with most of Andreia's comments. However: Saying that the command line never resolves symlinks is ranging from inaccurate to false. Bash has options to force symlink resolution on every cd and pwd. They default to false but there are plenty of people that use them. Think sysadmins with root on shared systems who can't afford to get "tricked" by symlinks. Physical and logical paths are different beasts and you mustn't force one over the other. I'd say implement a Realpath() method that resolves all symlinks and /../'s by actually looking in the filesystem. ( http://search.cpan.org/~dagolden/Path-Tiny-0.068/lib/Path/Tiny.pm#realpath )

Parent() is another frequent pain point. Attempt at getting it right here: http://search.cpan.org/~kwilliams/Path-Class-0.35/lib/Path/Class/Dir.pm

Path::Class claims parent() is logical only which is a good start but falls to specify behavior for corner cases like Path("/foo/bar/..").Parent(). I also intensely want Path("/").Parent() to return something falsy. surprise me what you choose here.

Consider the difference between: somePath.Parent().Realpath() and: somePath.Realpath().Parent() in the presence of symlinks and other oddities. Neither should surprise

On Mar 30, 2015 5:35 PM, "Andreia Gaita" notifications@github.com wrote:

In case of a Parent() call (which is - navigate to the parent of this directory), there's no symlink resolving at all, you're just building paths. The command line would give you /tmp/test in that case, since navigation never resolves symlinks.

It's the case where you actively list contents from a built path that gets more complicated. There are valid use cases for building trees of symlinks that have different parents but are aggregated in a common root, and it's very handy to be able to get at the real parents of the symlinks (and not the fake one where you're aggregating them). Imagine fetching/updating data from different subdirectories and running scripts from the parents of those subdirectories (which are actually in different places and update in different ways depending on where they are).

Path building/navigation without resolving symlinks is fine, that's what the command line does, it's the most common use case and the one that makes most sense most of the time. If the path is built by using Parent() calls, then the path resulting from that navigation isn't resolved while navigating, only later.

The command line resolves things differently depending on whether you're navigating or listing contents. Navigation never resolves symlinks, listing (or acting upon a result of navigation, as it were) always resolves. Since you don't know what the user is going to do with the path they've built (if they do new Path("/tmp/test/dir/.."), are they navigating or do they need to list files there?), it's hard to know whether to resolve the path or not without a clue from the user (or a conscious choice by the API that's clearly documented).

If the path being listed is a string with .. in it, that's where the API should make a conscious choice (and say so) about how it resolves. If you collapse and then resolve (the equivalent of doing cd test/dir/.., which returns test), the user will never be able to access the real parents of symlinks along the way. If you resolve while collapsing (the equivalent of doing ls test/dir/... which returns test/else), you might confuse the hell out of the user. You could choose never to resolve before collapsing, but give the user a choice to resolve the path so they don't have that limitation if they need it (either by a flag or a Resolve() call, perhaps?)

I really don't care that much about how the API deals with symlinks, but they do exist and scripts/OS calls (and people's assumptions) will break if you pass symlinked paths at them, so ignoring them is hard

— Reply to this email directly or view it on GitHub.

shana commented 9 years ago

Agree with most of Andreia's comments. However: Saying that the command line never resolves symlinks is ranging from inaccurate to false. Bash has options to force symlink resolution on every cd and pwd. They default to false but there are plenty of people that use them. Think sysadmins with root on shared systems who can't afford to get "tricked" by symlinks. Physical and logical paths are different beasts and you mustn't force one over the other.

It never does in the default case, so that means that the developers of the command line api chose that behaviour as their default api behaviour, and gave you flags to change it. The choice of which is the default behaviour and which flags to provide to change that behaviour depend highly on the normal use case of the api in question. The command line is the command line, with a certain target audience, which might or might not be the same for the NiceIO api. NiceIO can choose to completely ignore symlinks, it can choose to never ignore them, it can choose to provide flags to change the behaviour - I think that it doesn't have to provide the same behaviour and flexibility as the command line, since it's not a command line, but I would like it to be aware of symlinks and provide some flexibility in that regard. That would be nice and useful.

lasse-unity3d commented 9 years ago

Oops! Sent to soon...

Anyway Path::Class mentioned previously distinguishes between (logical) cleanup() and (physical) resolve(). I would do something similar. You have to make it explicit and I'd say constructor should do neither. I'd expect the Path constructor to preserve the string passed to it pretty much verbatim.

That, however, leads to questions like whether Path("/foo/") == Path("/foo") should be true. If you document the potential need for Cleanup() or Resolve() on either side of the comparison this might not be an issue though. It's because of things like this that I have a general dislike for overloading.

Also, things like /../../foo is a valid physical path so you should allow it too. On Apr 2, 2015 12:15 AM, "Lasse Makholm" lasse@unity3d.com wrote:

Agree with most of Andreia's comments. However: Saying that the command line never resolves symlinks is ranging from inaccurate to false. Bash has options to force symlink resolution on every cd and pwd. They default to false but there are plenty of people that use them. Think sysadmins with root on shared systems who can't afford to get "tricked" by symlinks. Physical and logical paths are different beasts and you mustn't force one over the other. I'd say implement a Realpath() method that resolves all symlinks and /../'s by actually looking in the filesystem. ( http://search.cpan.org/~dagolden/Path-Tiny-0.068/lib/Path/Tiny.pm#realpath )

Parent() is another frequent pain point. Attempt at getting it right here: http://search.cpan.org/~kwilliams/Path-Class-0.35/lib/Path/Class/Dir.pm

Path::Class claims parent() is logical only which is a good start but falls to specify behavior for corner cases like Path("/foo/bar/..").Parent(). I also intensely want Path("/").Parent() to return something falsy. surprise me what you choose here.

Consider the difference between: somePath.Parent().Realpath() and: somePath.Realpath().Parent() in the presence of symlinks and other oddities. Neither should surprise

On Mar 30, 2015 5:35 PM, "Andreia Gaita" notifications@github.com wrote:

In case of a Parent() call (which is - navigate to the parent of this directory), there's no symlink resolving at all, you're just building paths. The command line would give you /tmp/test in that case, since navigation never resolves symlinks.

It's the case where you actively list contents from a built path that gets more complicated. There are valid use cases for building trees of symlinks that have different parents but are aggregated in a common root, and it's very handy to be able to get at the real parents of the symlinks (and not the fake one where you're aggregating them). Imagine fetching/updating data from different subdirectories and running scripts from the parents of those subdirectories (which are actually in different places and update in different ways depending on where they are).

Path building/navigation without resolving symlinks is fine, that's what the command line does, it's the most common use case and the one that makes most sense most of the time. If the path is built by using Parent() calls, then the path resulting from that navigation isn't resolved while navigating, only later.

The command line resolves things differently depending on whether you're navigating or listing contents. Navigation never resolves symlinks, listing (or acting upon a result of navigation, as it were) always resolves. Since you don't know what the user is going to do with the path they've built (if they do new Path("/tmp/test/dir/.."), are they navigating or do they need to list files there?), it's hard to know whether to resolve the path or not without a clue from the user (or a conscious choice by the API that's clearly documented).

If the path being listed is a string with .. in it, that's where the API should make a conscious choice (and say so) about how it resolves. If you collapse and then resolve (the equivalent of doing cd test/dir/.., which returns test), the user will never be able to access the real parents of symlinks along the way. If you resolve while collapsing (the equivalent of doing ls test/dir/... which returns test/else), you might confuse the hell out of the user. You could choose never to resolve before collapsing, but give the user a choice to resolve the path so they don't have that limitation if they need it (either by a flag or a Resolve() call, perhaps?)

I really don't care that much about how the API deals with symlinks, but they do exist and scripts/OS calls (and people's assumptions) will break if you pass symlinked paths at them, so ignoring them is hard

— Reply to this email directly or view it on GitHub.