google / skicka

Command-line utility for working with Google Drive. Join the mailing list at https://groups.google.com/forum/#!forum/skicka-users.
Apache License 2.0
1.3k stars 120 forks source link

delete duplicates on upload #69

Open mmp opened 9 years ago

mmp commented 9 years ago

When we're uploading and there are duplicate files/directories at the target, I think we should just delete the duplicates and proceed. In the upload scenario, the user is fairly clearly saying "clobber what's on Drive with the local files", so this seems fine.

For duplicate files, we should be careful about which one we delete--e.g. if one of the duplicates is the same size as the local file but the others aren't, we should preserve the one that's the same size in the hopes that it matches and no upload is needed.

danmarg commented 9 years ago

How does this work today? I see some code for duplicate file detection to avoid uploading the same file if it's already on the server, but this appears not to work for me (uploading the same directory twice results in two copies uploaded). What am I missing?

mmp commented 9 years ago

We may be talking about two different types of duplicates--to be clear, under normal operation, if one runs skicka upload /local/path /drive/path twice in succession, then the second run should be a no-op, with nothing uploaded. If you are seeing it reupload the directory (as per the stats printed at the end), then there is definitely a bug. A run of skicka -verbose upload ... should explain what's going on.

(Actually, as I re-read, maybe you're asking about doing skicka upload big_file.mp4 /foo.mp4 and then skicka upload big_file.mp4 /bar.mp4 and avoiding the second upload? There's not actually support for that currently. It would be possible to add, but would it be useful?)

This issue was about the fun fact that Drive allows the user to have two files with the same name in the same folder, and also allows having multiple folders with the same name. Currently upload and download punt when the run into this case, though at least for upload, we could reasonably nuke the duplicates on drive and upload away, under the theory that the user was asking for that anyway...

Whew, so I guess that's three possible things that duplicates could refer to!

danmarg commented 9 years ago

I was asking about the first case. But I can't repro now, so...Shrug

On Thu, May 7, 2015 at 12:24 AM Matt Pharr notifications@github.com wrote:

We may be talking about two different types of duplicates--to be clear, under normal operation, if one runs skicka upload /local/path /drive/path twice in succession, then the second run should be a no-op, with nothing uploaded. If you are seeing it reupload the directory (as per the stats printed at the end), then there is definitely a bug. A run of skicka -verbose upload ... should explain what's going on.

(Actually, as I re-read, maybe you're asking about doing skicka upload big_file.mp4 /foo.mp4 and then skicka upload big_file.mp4 /bar.mp4 and avoiding the second upload? There's not actually support for that currently. It would be possible to add, but would it be useful?)

This issue was about the fun fact that Drive allows the user to have two files with the same name in the same folder, and also allows having multiple folders with the same name. Currently upload and download punt when the run into this case, though at least for upload, we could reasonably nuke the duplicates on drive and upload away, under the theory that the user was asking for that anyway...

Whew, so I guess that's three possible things that duplicates could refer to!

— Reply to this email directly or view it on GitHub https://github.com/google/skicka/issues/69#issuecomment-99629149.