peak / s5cmd

Parallel S3 and local filesystem execution tool.
MIT License
2.6k stars 228 forks source link

Strange behaviour of sync #720

Open praveenraj2018 opened 4 months ago

praveenraj2018 commented 4 months ago

Hej,

I recently noticed that my backup process running s5cmd v2.2.2 is skipping some directories which does not part of exclude patterns.

Below is the s5cmd that sync content of the parent directory to s3 bucket:

s5cmd --stat sync --include "*" --exclude "work/*" --exclude "test/*" --exclude "Partial/*" --exclude "*screen*" --exclude "*.pileup" "/storageA/*" s3://storageA/

An example folder and its contents at source:

ls /storageA/Omics/ready/
INFO.md

After the sync, I don't see the above directory in the destination:

s5cmd --stat ls s3://storageA/Omics/ready/
ERROR "ls s3://storageA/Omics/ready/": no object found

But works, if we specify one more level of directory in the commad as below: and this copies many files that the first command does not do.

s5cmd --stat sync --include "*" --exclude "work/*" --exclude "test/*" --exclude "Partial/*" --exclude "*screen*" --exclude "*.pileup" "/storageA/Omics/*" s3://storageA/Omics/

This looks quite strange! what could be the problem with the first sync command?

Any suggestions?

thiell commented 3 months ago

I've also noticed such partial copy behavior with the sync command when transferring many files (200k+) with the latest release v2.2.2-48f7e59, whole folders are simply not copied to the destination. The s5cmd command seems to stop before all the work is done with return code 0 and no visible error messages (even in trace or debug mode).