systemd / casync

Content-Addressable Data Synchronization Tool
1.49k stars 117 forks source link

mtree: prepend "./" to all entries other than "." #267

Open cyphar opened 1 year ago

cyphar commented 1 year ago

casync attempts to output all entries as a Full entry but for top-level entries, there is no "/" in the escaped pathname, resulting in the entry being treated as a Relative entry.

As per the specification1, this results in any subsequent Relative entries (such as top-level files) being treated as children of the directory which causes manifests produced by casync to fail.

The simplest solution (and the one that "nmtree -C" does) is to prefix every path with "./". However, nmtree does not like "./." (internally it requires every entry referenced in an Full entry's path to already have been referenced in the spec) and so we have to special-case the "." path.

Ref: https://github.com/vbatts/go-mtree/issues/146 Fixes #167 Fixes #168 Signed-off-by: Aleksa Sarai cyphar@cyphar.com

cyphar commented 1 year ago

Another option would be to prepend every path with ./, which would ensure every path (even files) are parsed as Full entries. This PR only forces directories to be treated as Full entries (but this is sufficient to solve the issue).

cyphar commented 1 year ago

As an example of the problem, in the casync source dir, the following entries are (per the spec) treated as Relative even though casync wants them to be treated as Full:

% casync mtree . | grep type=dir | grep -v '[^ ]*/[^ ]* '
. type=dir mode=0755 uid=1000 gid=100 uname=cyphar gname=users flags=nohidden,nosystem,noarchive,nosappend,nosimmutable time=1685872243.866035931
.git type=dir mode=0755 uid=1000 gid=100 uname=cyphar gname=users flags=nohidden,nosystem,noarchive,nosappend,nosimmutable time=1685872405.954860213
build type=dir mode=0755 uid=1000 gid=100 uname=cyphar gname=users flags=nohidden,nosystem,noarchive,nosappend,nosimmutable time=1685872243.866035931
coccinelle type=dir mode=0755 uid=1000 gid=100 uname=cyphar gname=users flags=nohidden,nosystem,noarchive,nosappend,nosimmutable time=1685871728.307415356
doc type=dir mode=0755 uid=1000 gid=100 uname=cyphar gname=users flags=nohidden,nosystem,noarchive,nosappend,nosimmutable time=1685871728.307415356
shell-completion type=dir mode=0755 uid=1000 gid=100 uname=cyphar gname=users flags=nohidden,nosystem,noarchive,nosappend,nosimmutable time=1685871728.307415356
src type=dir mode=0755 uid=1000 gid=100 uname=cyphar gname=users flags=nohidden,nosystem,noarchive,nosappend,nosimmutable time=1685871976.040674579
test type=dir mode=0755 uid=1000 gid=100 uname=cyphar gname=users flags=nohidden,nosystem,noarchive,nosappend,nosimmutable time=1685871728.323415437
test-files type=dir mode=0755 uid=1000 gid=100 uname=cyphar gname=users flags=nohidden,nosystem,noarchive,nosappend,nosimmutable time=1685871728.319415416
tools type=dir mode=0755 uid=1000 gid=100 uname=cyphar gname=users flags=nohidden,nosystem,noarchive,nosappend,nosimmutable time=1685871728.323415437

And any other non-Full path (which is any other top-level entry) will be treated as a child of one of these directories.

cyphar commented 1 year ago

Looking at nmtree (the original BSD version appears to not support Full types at all), nmtree -C outputs everything with ./ prefixed other than the top-level . entry. I've switched this PR to doing that.

Now the output of casync mtree parses correctly, however there are two remaining issues:

  1. There is a bug in nmtree when it comes to hashing sha256 specifically, so the basic round-trip test of casync mtree --with=unix --digest=sha256 | nmtree fails. See archiecobbs/nmtree#4.
  2. casync mtree produces second-resolution timestamps but nmtree and gomtree both expect time to be the resolution provided by the filesystem. With gomtree this can be worked around by changing the time keyword to tar_time, but with nmtree you just have to remove it.

That being said, if you adjust the output slightly, now casync correctly works with gomtree and nmtree:

% casync mtree --with=unix --digest=sha256 | sed 's/time=/tar_time=/g' | gomtree
% echo $?
0
% casync mtree --with=unix | sed -E 's/ (time=[0-9]+\.[0-9]+|\w+digest=\w+)\b/ /g' | nmtree
% echo $?
0

Though I think that it would make sense to output the filesystem resolution timestamp if available since that is how every other mtree tool works.

cyphar commented 1 year ago

@poettering Sorry for pinging you directly. This fixes the issue we discussed when we met a few weeks ago (you were right that the format is valid, but casync's output is not what you would want when there's only one path component). The semaphore test failure is some issue with the PPA used by the tests.

cyphar commented 11 months ago

/cc @keszybz @bluca

evelikov commented 7 months ago

OOC: Should there be a new test to flag this issue?