Use inode index for fast rename/additional link detection

markmcb commented 5 years ago

What happened

3 systems: A and B are co-located, C is remote, all are Linux. Folder F1 (with ~3TB of data) is renamed to F2 on A. After rename, A takes about 4 hours to fully re-scan. B and C take even longer to catch up. Folder is "send only" on A, and "receive only" on B and C.

What did you expect to happen instead of what did happen

Much faster sync given that it was renaming only one folder name. Instead it seems to need to traverse and consider all files. I could see this maybe being necessary on remotes, but I'd think the local change would be very quick.

To be clear, nothing is broken. Just slower than I'd expect.

What operating system, operating system version and version of Syncthing you are running

A is Fedora 30, ST 1.1.1 (4c/8t, 32 GB ECC RAM, Intel E3-1275)
B is Fedora 30, ST 1.1.1 (16c/32t, 64 GB ECC RAM, Dual Intel E5-2428)
C is Ubuntu 19.04, ST 1.1.1 (4c/8t, 32 GB ECC RAM, Intel E3-1270)
All accesses by Firefox 66 via ssh tunnels

I've seen much older issues (~2014) posted regarding renaming, but they all seemed closed long ago. I thought I'd post this. If this is a design decision, no worries. I've only been on ST a few weeks and this is the only oddity I've noticed.

calmh commented 5 years ago

If we are running with filesystem notifications, and we are willing to trust those to 100% (which is not a given, imho), we could probably catch directory renames. In all other cases all that we see are some files going away and some new ones appearing. There is then no choice but to hash those files to figure out what they are.

The other rename tickets are mostly about what to do on the other side. There we do pair together a delete of one file with a creation of another one with the same block hashes and interpret it as a rename (but there are some racy requirements on the announcements for us to be able to do the pairing).

Ferroin commented 5 years ago

There is then no choice but to hash those files to figure out what they are.

Only if we don't want to add a bit more data to the database.

More specifically, for any sane filesystem on a UNIX system, if the inode number, ctime, and size of the file are the same as a file already in the database, it's reasonably safe to assume it's the same file. The only exceptions to this are when you are crossing filesystem boundaries (different filesystems, different inode number spaces), and a handful of odd filesystems that don't have stable inode numbers (and that Syncthing is not likely to be running on top of).

If Syncthing were to track that info (ideally it should be optional because of the aforementioned exceptions), it could in theory skip the hashing step for most files in the case of a directory rename within a single shared folder.

calmh commented 5 years ago

That's a good point. We could keep a (device, inode) -> FileInfo index on the side and do lookups there first for new files. If there is a match we do the usual metadata comparison to determine if the file is the same. If it is we can just use the data as is. (Windows has "file IDs" which seem equivalent-ish.)

AudriusButkevicius commented 5 years ago

Not sure how this would help, as you'd still end up with an add and a delete, which might happen millions of file entries apart. Let's say a delete happens first, we clear the inode index entry, detect a new entry and the same inode, and do what?

You still need some sort of "hold shit in memory and spill to disk" thing while scanning to find these effectively during the scan. Whether it's content based or inode based is pretty much irrelevant. You can find "I think A moves to B" based on size and mtime within the same scan. Having the inode index perhaps help have more certainty in A moves to B.

Regarding effective behaviour on the other side, we already worked out a way to do that by putting the addition and removal close together, but that's the problem that you need to do a full scan before you can send anything by either holding stuff in memory then spilling to disk, or do a multi-pass scan (essentially always sending deletes last, which in case of "renames" will require double the space).

Ferroin commented 5 years ago

@AudriusButkevicius We aren't efficiently scanning in the case of a moved directory right now with what Syncthing is already doing (I've checked this myself since this morning, testing a scenario like the OP mentioned appears to result in Syncthing rehashing everything in the renamed directory). Using metadata like this would allow Syncthing to skip rehashing files that really are the same file and properly say that they moved with near certainty (barring weirdness like sshfs which doesn't have stable inode numbers).

Inode plus ctime plus size is also a whole lot more reliable than mtime plus size in cases when the filesystem watcher isn't being used (mtime is trivial to set from userspace, while ctime is not and inode number is impossible), so this would make the case of running without the filesystem watcher a bit more reliable for things like single file renames.

This could also be used with a bit of creativity to actually speed things up in some cases. Inode numbers are (again barring weirdness like sshfs) are unique for the lifetime of a file. If we find one that matches a file already in the database, that means in almost all cases that that file either got renamed, or got deleted. In that case, bumping the file it matched to the top of the queue of files to scan so that it gets scanned next could conclusively determine which is the case, which would then let Syncthing immediately send an update indicating that it got renamed if it did indeed get renamed (thus eliminating part of the need to wait for the scan to finish completely before sending anything).

calmh commented 5 years ago

Currently for any given scan we handle additions before deletions so that should be fine. FS watcher also helps with this. So it seems to me it would work in most cases. It's also one of those things that doesn't really matter if it doesn't work in all possible corner cases as it's just an optimization for the common case.

AudriusButkevicius commented 5 years ago

Yeah, it would work, so would storing what is deleted vs added, I am not disputing that. I am just saying it still needs a good ol' database, as this thing might be huge for large trees.

Ferroin commented 5 years ago

I believe file size is already being stored. ctime would be the same format as mtime in terms of storage requirements, and inodes are just a 64-bit unsigned integer. Yes, it will use more space, but it's probably not going to be particularly bad (the database isn't exactly space efficient as-is for large trees).

calmh commented 5 years ago

My first idea was to add an index that keys inode to filename, updated whenever we insert a fileinfo for the local device ID. Maybe add a non-wire-transmitted inode field to the fileinfo to simplify doing this directly at the database layer. Lookup would be two step, inode to name to fileinfo. It would depend on doing the lookup before a delete happens, but that’s the limitation anyway. No idea how to clear out this index; garbage collection is a possibility.

infectormp commented 5 years ago

Maybe inotify can help in this situation? see https://github.com/syncthing/syncthing/issues/5755

syncthing / syncthing

Use inode index for fast rename/additional link detection #5689