peak / s5cmd

Parallel S3 and local filesystem execution tool.
MIT License
2.71k stars 241 forks source link

Source error with sync delete all dest files #695

Open adrienyhuel opened 10 months ago

adrienyhuel commented 10 months ago

Hello,

I'm trying to synchronize the content of an S3 to a local directory.

If there is an error on source (like a network timeout to reach the S3, or the bucket doesn't exist), the existing files in the directory can be all deleted.

I suspect there is a race condition between listing source and dest, and the first generating an error will fail the sync process. I think about that because if the dest folder contain only few files, then the files are deleted. But if the dest folder contains lots of files (like /tmp dir), no files are deleted, because src s3 lisiting fail before dest local dir listing finish.

GRMrGecko commented 9 months ago

I'm working this out, will have a pull request in once testing is done.

GRMrGecko commented 9 months ago

Ok, I have two pull requests that are waiting on the maintainers for review.

This one should outright solve the issue for errors: https://github.com/peak/s5cmd/pull/698

This one is more of an idea I thought of as I use the trick on rsync: https://github.com/peak/s5cmd/pull/699