peak / s5cmd

Parallel S3 and local filesystem execution tool.
MIT License
2.72k stars 243 forks source link

Flatten list? #596

Closed szeitlin closed 1 year ago

szeitlin commented 1 year ago

I see you have --flatten with cp but it seems like it's not supported for ls?

I might be able to do a pull request to add this, it would be really useful for deeply nested directories (not my idea to store the data this way, I just get paid to clean it up).

denizsurmeli commented 1 year ago

Can you share an example folder structure and the command output you want to see ?

szeitlin commented 1 year ago

Sure, so when I do list, I just see the top level, e.g.

    s5cmd ls s3://mybucket/
    --one
    --two

but if I do copy I get back the full paths, which is what I want to see when I list, e.g.

    s5cmd cp s3://mybucket/one/*  s3://otherbucket/new/
    cp s3://mybucket/one/folder1/file1 s3://otherbucket/new/one/folder1/file1
    cp s3://mybucket/one/folder2/file2 s3://otherbucket/new/one/folder2/file2

This might be because of the way the s3 protocol works, but it seems like it should be pretty straightforward for s5cmd to recurse and give you back the path when you do ls ? For example, if I do ls with the oracle SDK, it gives me back the full paths.

denizsurmeli commented 1 year ago

Maybe rather than --flatten, maybe naming the flag --show-fullpath would be better. Also, ls shows the dates and sizes of the objects, do you need them in your case ? Or is there a case where you only need the full-path filenames where you pipe them to somewhere else ?

szeitlin commented 1 year ago

Agreed, --show-fullpath would be great. And yeah in this case I don't care about the dates, and sizes are nice-to-have. Just getting the full-path filenames would be enough.

For example, I've got some cases where everything is nested and I want to convert the full path into the new filename, so I don't lose the provenance, and don't risk accidentally overwriting when the files themselves have the same name across all the folders.