stephenh / mirror

A tool for real-time, two-way sync for remote (e.g. desktop/laptop) development
Apache License 2.0
391 stars 37 forks source link

Reverting to older files or state is troublesome #26

Closed fezzzza closed 5 years ago

fezzzza commented 5 years ago

I have found 3 use cases that seem to highlight this issue:

1) I accidentally moved a file from my dev folder instead of copying it. The remote file was deleted on the server - so far this is correct. Then when I tried to move the file back to the original folder, I guess mirror's logic is "that delete operation on the remote outdates the mtime of the old file so I'll delete it on the client".

2018-10-31 15:22:45 INFO  dev/adminBugs.php isLocalNewer
2018-10-31 15:22:45 INFO    l: modTime: 1540672411588 delete: true local: true
2018-10-31 15:22:45 INFO    r: modTime: 1540672410588 data: "initialSyncMarker" local: true
2018-10-31 15:22:45 INFO  Sending (delete) dev/adminBugs.php
---
2018-10-31 15:23:05 INFO  Queueing: path: "dev/adminBugs.php" modTime: 1540672410588 local: true
2018-10-31 15:23:05 INFO  Queueing: path: "dev" modTime: 1540999385424 local: true directory: true executable: true
2018-10-31 15:23:05 INFO  dev/adminBugs.php isRemoteNewer
2018-10-31 15:23:05 INFO    l: modTime: 1540672410588 local: true
2018-10-31 15:23:05 INFO    r: modTime: 1540672411588 delete: true local: true
2018-10-31 15:23:05 INFO  Remote delete dev/adminBugs.php
2018-10-31 15:23:05 INFO  Queueing: path: "dev/adminBugs.php" delete: true local: true

The workaround is to touch the file before copying it back.

2) I want to revert to an older version of a file so I overwrite it with a backup, but the file change is not propagated to the server:

2018-10-31 15:08:29 INFO  Queueing: path: "edge/adminBugs.php" modTime: 1540672410588 local: true
2018-10-31 15:08:29 INFO  Queueing: path: "edge" modTime: 1540998509051 local: true directory: true executable: true
2018-10-31 15:08:29 INFO  Queueing: path: "edge/.goutputstream-V8SQRZ" delete: true local: true

Again, touching the file cures this but I would prefer to maintain mtime on my files wherever possible.

3) While dropping in a large directory to my dev folder on the client, I ran out of space on the remote and mirror wrote a bunch of 0-length files. Deleting the 0-length files on the remote then deleted the counterparts on the client. Clearing out enough space on the server and then re-copying that directory on the client resulted in the files being deleted from the client because the delete operation is newer than the mtime. The workaround was to stop mirror both ends, copy the files manually both sides and restart mirror both sides.

I would guess there's probably some information in the ext3/ext4 journal that can be relied upon to be sure which is the latest operation, because it's not necessarily the latest mtime that we want to preserve.

stephenh commented 5 years ago

that delete operation on the remote outdates the mtime of the old file

Yes, that is what happens, mirror creates a fake mtime of the delete to be able to tell which local vs. remote version "wins".

I think this is necessary for at least how mirror is currently fundamentally designed, but there is a slight possibility that, after deciding which version has won (and the delete wins), mirror could be better about cleaning up the marked-deleted path, e.g. by unsetting the fake mtime.

I'll poke around at that when I get a chance.

fezzzza commented 5 years ago

In the mean time, so that I can understand and workaround - is that fake mtime dropped when mirror is restarted or does it persist in some way? So if I restart mirror, can I take it that the state is reset? Should I need to restart mirror just on the client or on the server as well?

stephenh commented 5 years ago

Yep, it's only kept in memory, so restarts will reset it.

stephenh commented 5 years ago

Both the server and client have their own copy of it in memory, so would need to restart both.

fezzzza commented 5 years ago

mirror could be better about cleaning up the marked-deleted path, e.g. by unsetting the fake mtime.

Well, that would take care of one use case but beware with that, though. What if there was another spoke on third device - say a laptop that won't connect for a couple of days, but is expecting the latest versions - but not necessarily the latest mtimes) to be synched up when it connects? That's the scenario I was hoping to use mirror for so I can dev in the field and not have to worry about, "Did I remember to git push at home yesterday?"

stephenh commented 5 years ago

What if there was another spoke on third device - say a laptop that won't connect for a couple of days, but is expecting the latest versions

Ha, right. Although that is also what happens if you were to stop mirror and then reconnect... It forgets the fake mtime and would treat the missing-for-a-few-days laptop's file as "well, this must be good".

Which, in general is a safe default, if either server/client has a file, go ahead and keep it, with the assumption that the user re-deleting something is less painful than un-deleting something they wanted to keep.

Dunno, will think about. Ideas appreciated of course.

fezzzza commented 5 years ago

Just thinking it though I would love my server to say "here's a replay of all the transactions since you last connected so we know you have the right files" - How confident would that sound? Whether it takes that replay from the ext4 journal (how far back does that go?) or maintains its own journal (with user-defined expiry time/history length) for nominated folders that the server keeps a track of. Just a thought - hope I haven't ruined your plans!!! Keep up the good work! It's at least usable in this state now that I know what to look out for!

fezzzza commented 5 years ago

Ha, right. Although that is also what happens if you were to stop mirror and then reconnect... It forgets the fake mtime and would treat the missing-for-a-few-days laptop's file as "well, this must be good".

Is that right? Didn't you point out that the fake mtime is also maintained on the server, though, and a stop-start both sides would be required to flush it?

Both the server and client have their own copy of it in memory, so would need to restart both.

Furthermore - what happens if a third device maintains a copy of the fake mtime, while the first 2 devices are stop-started?

stephenh commented 5 years ago

stop-start both sides would be required to flush it

Right, sorry, I meant "stop both sides".

third device maintains a copy of the fake mtime

Also right, "stop all sides". :-)

here's a replay of all the transactions since you last connected so we know you have the right files

That is not a bad idea conceptually. Pragmatically, the biggest challenge is clock drift, e.g. even with mtime, if the server/client get out of sync, a "but I just wrote that!" on a slightly-behind client can get stomped by a slightly-ahead server (or vice versa). (mirror has an extremely extremely (initial on connection) naive clock drift detection because I had a few issues reported by users, and experienced it myself, that ended up being due to that.)

So, just a clock by itself is hard/"impossible" to get a true combined ordering of "this then that happened" across several remote machines (which is what the journaling approach would need to do, e.g. interleave "server thought this happened" with "client a thought that happened" with "client b thought something else happened" and come up with a sane result).

Granted, there are ways to do this, e.g. lamport timestamps (probably, I've not actually used them before) and other distributed computing primitives, but that gets into "building a real distributed system" instead of "just compare some mtimes and pick one". :-)

I might be able to detect the condition locally when a "this path in memory has mtime of fake+1 and oh wait the file system event told us it just came back" and then bump its original mtime (which mv didn't bump) to be fake+1+1. :-) Granted, that's messing with your mtimes, but hopefully only slightly, e.g. by a second or two, just enough for it to be newer.