Closed evgmik closed 3 years ago
I guess the symwalk
package can be copied and adjusted to support:
I guess the
symwalk
package can be copied and adjusted to support:* A maximum depth of traversal.
What we already have does it. I run some test and I see no obvious bugs in it.
* Not visiting the same node (identified by inode/device_id) more than once.
I would disagree on this. I think this is the feature of symlinks: multiple links can point to the same inode
. If we remove it, it would be very similar as not having symlinks at all (except the cases when they point outside of the staging root).
My use case: having a template dir (with makefiles, styles, and common configs). I have quite a lot of link to this template dir within a working tree. Think 'lectute template' to which weekly lectures point.
Also, did we agree on an option to follow links and/or to ignore errors?
This is in my TODO list. I was looking for a general approval of the symlink handling, before adding extra options.
An embarrassing story: I spent 5 hours figuring out how to append
a slice returned by a function. The go
error reporting was very unclear about it.
...
did the work, but it took me long time.
I guess the
symwalk
package can be copied and adjusted to support:* A maximum depth of traversal.
What we already have does it. I run some test and I see no obvious bugs in it.
* Not visiting the same node (identified by inode/device_id) more than once.
I would disagree on this. I think this is the feature of symlinks: multiple links can point to the same
inode
. If we remove it, it would be very similar as not having symlinks at all (except the cases when they point outside of the staging root).My use case: having a template dir (with makefiles, styles, and common configs). I have quite a lot of link to this template dir within a working tree. Think 'lectute template' to which weekly lectures point.
Hmm, you have a point here. I was more worried about such pathological (hah, "path") cases:
λ werkbank sub → cd /tmp && mkdir sub && cd sub
λ werkbank sub → dd if=/dev/urandom of=file.big bs=1M count=100
λ werkbank sub → ln -s . link
λ werkbank sub → ls
file.big link
λ werkbank sub → cd link
λ werkbank link → cd link
λ werkbank link → cd link
λ werkbank link → cd link
λ werkbank link → cd link
λ werkbank link → cd link
λ werkbank link → cd link
λ werkbank link → cd link
λ werkbank link → pwd
/tmp/sub/link/link/link/link/link/link/link/link
...in that case the current code would add an infinitely nested structure, each level with file.big
in it. You could say that this is normally not true, but I consider such things a security issue because you could easily DDoS a machine with brig if you somehow manage to create a symbolic link and make it stage something.
So both solutions lack something:
I really don't like the concept of symbolic links. Without supporting some sort of "link" in brig we can't really cover both cases and that is really out of scope. Maybe one way would be to "burn" a directory-symlink, so we can traverse it only once and won't the next time. Since I can't come up with a good solution here you have my go to proceed. Might play around at some point, so please leave a TODO with the problem described above.
An embarrassing story: I spent 5 hours figuring out how to append a slice returned by a function. The go error reporting was very unclear about it.
...
did the work, but it took me long time.
Sorry to hear that :grin: The ...
thing isn't very intuitive also. I think I wrote a for loop around it the first time.
λ werkbank sub → cd /tmp && mkdir sub && cd sub λ werkbank sub → dd if=/dev/urandom of=file.big bs=1M count=100 λ werkbank sub → ln -s . link λ werkbank sub → ls file.big link /tmp/sub/link/link/link/link/link/link/link/link
...in that case the current code would add an infinitely nested structure, each level with
file.big
in it. You could say that this is normally not true, but I consider such things a security issue because you could easily DDoS a machine with brig if you somehow manage to create a symbolic link and make it stage something.
Just small correction: we already have recursion depth protection with the current code. The limit is set to 255 in the current walk
code, see
6a1e67f line 73. What would be nice is to reference file.big
to the same backend hash. Unfortunately, we lost the automatic way for this when we switched to random encryption key from the content generated one.
But since I collect long list of files toBeStaged
I can see which of them dereferenced to the same local file. And just copy the node
info. This will save quite a lot of space at the backend for some scenarios, it also could speed up staging for heavily linked cases.
We might even attempt it with directories, but this is scary. I am not sure our linker will survive it.
So both solutions lack something:
* Ignoring cycles limits valid use cases. * Not ignoring cycles opens room for malicious input.
I really don't like the concept of symbolic links. Without supporting some sort of "link" in brig we can't really cover both cases and that is really out of scope. Maybe one way would be to "burn" a directory-symlink, so we can traverse it only once and won't the next time. Since I can't come up with a good solution here you have my go to proceed. Might play around at some point, so please leave a TODO with the problem described above.
Sure thing I will leave the TODO.
I am still working on proper link handling.
still working on it. In particular, on deduplication: if several repo paths points to the same content via symlinks during staging, then we should just copy repo node to new repopaths. This is how I triggered bug #105
How are you implementing the linking? Are you using fs.Copy()
or do some low-level trickery? Or something different entirely more on client-side?
How are you implementing the linking? Are you using
fs.Copy()
or do some low-level trickery? Or something different entirely more on client-side?
I am going to use fs.Copy()
. It seems to be sufficient except it does not copy to non existing path (at least on a first glance).
Status update. Looks like I completed reflinking of the symlinked files. Next is proper treatment of the symlinked directories, then implementation for options.
Still working on hardlinks and dirs
Do we really want to implement hardlinks reflinking?
To track hardlinks we have to use OS dependent package x/sys. I would argue that this is an overkill and will be hard to maintain. The price we pay is some data duplication (I start to regret our decision on a random key choice for every file, instead of content based.)
I personally never used hardlinks in real life. But I need a weighted @sahib opinion on it.
One more thing: I decided to skip copying symlinked directories. The current algorithm walks such directories and copies its content, instead of fs.Copy(src_dir, dest_dir). But I think this is not performance crucial. The reason: it is hard to implement properly, since I need to track which dir is already staged to do its copy, so the code grows for a questionable speed up.
Other than this, I will be working on skip symlinks and continue on the minor staging errors switches/options.
Do we really want to implement hardlinks reflinking?
It's not an easy discussion. Here's my take.
I care mostly for consistency. If we support symbolic links we should also support hardlinks. To be fair, using fs.Copy
is more an optimization more or less invisible to the user, but still. Good software should support all normal operation that a user encounters or, if that seems to be out of reach for whatever reason, limit itself to a sensible subset. The latter is the way that Go choose, by ignoring everything in filepath.Walk()
that's not a regular file. So, let's ask ourselves if we're consistent:
The second point is inconsistent, but I would be okay to let it slip since it's a non-essential optimization and it's likely that most users don't even care or know. So I'm kinda okay with the current state of not explicitly supporting for the following reasons:
I personally never used hardlinks in real life. But I need a weighted @sahib opinion on it.
I use them quite often. For example parts of my music library are hardlinks and I use them to create "playlists" that can be easily copied to elsewhere without fear of tools that do not handle symbolic links right. But also many tools like ostree, git and many more are based on them. Personally, I see them more often than symbolic links.
To track hardlinks we have to use OS dependent package x/sys.
Well, yes. It adds a bit of complexity, but not much. One could just have a "key" type that is either dev/inode on unix or path on windows.
Other than this, I will be working on skip symlinks and continue on the minor staging errors switches/options.
:+1:
I was convinced that hardlinks are a nice feature (should I expect anything different from rmlint
author ? :)
I have added hardlinks reflinking with 9f93d01. It is tested only on linux, and on other systems should fall back to just staging a hardlinked file independently (not sure it supported by Windows at all).
Technically, it should work on any unix like system, not just linux. But I do not see the way to make a build flag similar to
// +build linux
if I replace linux
with unix
the system detects both inodeString
declarations. But it should be a minor thing which could be tested if we get a hold of a Mac
Nice! Implementation looks okay to me. Will have deeper look once the options are also implemented.
[...] should I expect anything different from rmlint author?
Damn, my past catches up. :smile: But yes, not scanning hardlinks twice was an issue there.
Technically, it should work on any unix like system, not just linux. But I do not see the way to make a build flag similar to [...]
If I understood you right you could have the following configuration (untested, but did something similar once):
$ ls inode_*
inode_unix.go inode_other.go
$ head -1 inode_unix.go
// +build !windows
$ head -1 inode_other.go
// +build windows
Under the assumption that pretty much everything beside windows is a unix or at least has inodes. Even if that does not apply I think it's better to keep it like that so users can complain when compiling for this platform.
If I understood you right you could have the following configuration (untested, but did something similar once):
$ ls inode_* inode_unix.go inode_other.go $ head -1 inode_unix.go // +build !windows $ head -1 inode_other.go // +build windows
Under the assumption that pretty much everything beside windows is a unix or at least has inodes. Even if that does not apply I think it's better to keep it like that so users can complain when compiling for this platform.
Excellent suggestion, implemented with 38704e1
Now, back to the staging options...
I have added options:
--no-dereference, -P
--continue-on-error, -c
@sahib I think this patch set is ready for the final review
Small personal note: I have to sort out some personal stuff and and will be slow to respond/develop in the next 1-2 weeks.
Small personal note: I have to sort out some personal stuff and and will be slow to respond/develop in the next 1-2 weeks.
No worries. I will start working on 'copy' related bugs #105 and #106
Fixes #101
Also fixes exotic case when now to be staged files are detected (i.e. we asked to stage symlinked directory), in this case there were endless wait for
pbars