rclone / rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
https://rclone.org
MIT License
45.97k stars 4.11k forks source link

New Options to Dedupe treatment on copy/sync/move commands #3683

Open naeloob opened 4 years ago

naeloob commented 4 years ago

Problem : Have to copy/move/sync from a GDrive source with duplications, but not granted to modify that source, so you cannot dedupe.

On log file appears "Duplicate object found in source - ignoring", so it's detected. I think that having a parmeter like --NotIgnoreDuplicate could make the same logical than dedupe but instead of remove duplicated just copy the good one. In case duplicated files are not the same just copy all renamed. That way it's not an interactive parameter.

ncw commented 4 years ago

I see what you mean, so an --allow-duplicates flag...

This will break the syncing unfortunately - rclone tries to pair file names up by name to see if they have changed.

If there are multiple files then rclone may pair them up the wrong way, though provided the cloud provider gives the files in a consistent order then it will work...

This would be quite easy - do you want to work on it?

You'd need to make a new flag here

https://github.com/rclone/rclone/blob/e81eca405539a18f685ff3cb8c5d7a6f7eea12ca/fs/config/configflags/configflags.go#L106

Add it in here

https://github.com/rclone/rclone/blob/e81eca405539a18f685ff3cb8c5d7a6f7eea12ca/fs/config.go#L105

Then use it here

https://github.com/rclone/rclone/blob/e81eca405539a18f685ff3cb8c5d7a6f7eea12ca/fs/march/march.go#L314

and here

https://github.com/rclone/rclone/blob/e81eca405539a18f685ff3cb8c5d7a6f7eea12ca/fs/march/march.go#L326

Making it say something like dstName == prevName && fs.DirEntryType(dst) == fs.DirEntryType(prev) && !fs.Config.AllowDuplicates

naeloob commented 4 years ago

Sorry mate, unfortunately i don't have the golang skills needed to do that.

syp1975 commented 4 years ago

Would not it be easier to allow the use of the --dedupe-mode flag (without the interactive mode) in copy/sync/move commands? It may be useful for some people to keep the newest file and for others it may not be that important and just keep the first encountered file.

ivandeex commented 3 years ago

Related to https://github.com/rclone/rclone/issues/4412 (handling duplicate file names on Gdrive)