peak / s5cmd

Parallel S3 and local filesystem execution tool.
MIT License
2.71k stars 240 forks source link

Fix wildcard sync command if files have special characters #761

Open briceflaceliere opened 1 month ago

briceflaceliere commented 1 month ago

Problem encountered: I am using s5cmd to sync several million files, some of which contain special characters (such as \u00a0, quotes, etc.). However, the sprintf("%q") function used in generateCommand modifies these file names, causing the cp command to no longer find the files correctly.

Solution provided: To resolve this issue, I used the github.com/kballard/go-shellquote library to properly escape URL parameters without converting UTF-8 characters. This process is applied only when necessary, ensuring file names are preserved.

Test changes: I modified the tests accordingly. However, I am not certain of the potential impact on other parts of the project. I will be able to confirm if this resolves the synchronization issues once the migration of our millions of files is completed.

Related issues:

igungor commented 1 month ago

Hey @briceflaceliere, thanks for sending a patch! Appreciated. Using shellquote looks good to me.

Could you add your scenario to cp integration test case to demonstrate the fix please? Thank you.

https://github.com/peak/s5cmd/blob/12d10381b36a27004e1f690992ebb7f6ba97c171/e2e/cp_test.go#L2142-L2146