tarka / xcp

An extended `cp`
GNU General Public License v3.0
717 stars 24 forks source link

Skip same size an existing file #33

Closed mgr9525 closed 1 year ago

tarka commented 1 year ago

Thanks @mgr9525. However as-is this PR changes every file in the repo, adding MSDOS line-endings, which makes the diff unreadable. Can you explain what the PR does? The title implies this will skip copying files of the same size? In what scenario would this be desirable?

mgr9525 commented 1 year ago

hi @tarka . In the case, I will use ssh to copy a 500G folder with a large number of files, which requires a lot of time. I hope to skip the copied files when copying again after interruption to save time. After copying, I can also copy it again to verify that all files have been copied. This requires that xcp can skip files of the same size.

tarka commented 1 year ago

This would be unreliable, and in the case of the parblock driver would almost certainly result in data-loss. The first thing the copy does is allocate an empty (sparse) file, and then copy the data across. This is necessary for the correct copying of sparse files (in normal file copy), and is intrinsic to how the parblock driver works (i.e. the file is created first, and then chunks of the file are copied in parallel). If either of these operations are interrupted then you will be left with a file of the 'correct' size but incomplete data.

There are improvements that could be made to xcp to help with this (e.g. temp copy file + atomic move), but these have their own complications. xcp is only really intended to be used for local copy where the 'connection' is reliable. I'm not even sure how you would use xcp over SSH; are you using fuse/SSHFS?

There are other tools that might be better for your use-case. In particular rsync is designed to copy data over unreliable connections with retry and continuation, and it works with SSH.

mgr9525 commented 1 year ago

Thanks for your reply. this will close.