wallix / awless

A Mighty CLI for AWS
http://awless.io/
Apache License 2.0
4.97k stars 263 forks source link

Not all s3 objects are returned during list s3objects #200

Closed res0nat0r closed 6 years ago

res0nat0r commented 6 years ago

Hi,

It seems only a subset of my objects in s3 are being returned by listobjects. It doesn't seem to be on an even paging boundary so maybe it is something else? All objects in this bucket are archived to glacier but only a subset appear to be returning.

I'll be happy to do any troubleshooting just let me know.

Example:

$ awless list s3objects --filter bucket=my-bucket-here -r us-east-2 --no-headers | wc -l
2980
$ aws s3 ls --recursive s3://my-bucket-here | wc -l
49511
simcap commented 6 years ago

@res0nat0r Sorry for delay and thanks for reporting :+1:

Let's get to the bottom of it. A few things to be aware of

With that in mind and surprisingly I would expect the huge number of s3objects to be on awless side. Silly question, but are you sure the count results where not inverted when pasting into this issue?

res0nat0r commented 6 years ago

Hi @simcap thanks for the response,

Thanks for the tsv tip, that output makes sense and seems to more line up with the list s3objects not paginating through the response via NextToken.

Using the below now returns:

$ awless -r us-east-2 list s3objects --format tsv --filter bucket=my-bucket-here | wc -l
1001

So 1001 is 1000 objects plus the one header which is the default returned by the s3 api without iterating through any NextTokens. My bucket name that I am filtering on is unique in my namespace so it should be the only bucket contents getting returned, and I do for sure have tons more than that in the bucket as it is my archive, there are ~50k items there for sure.

Hope this helps.

simcap commented 6 years ago

Ok, thanks @res0nat0r . Tomorrow I will look at if the pagination is properly done on our side and put a fix if needed.

simcap commented 6 years ago

Above commit fixes the issue.

Pagination was missing when listing s3object. Reason was: you pay AWS when fetching s3 objects, so at the beginning of our awless product we did not support the pagination on this specific API endpoint since it costs users... and then we forgot about it ;) . Retrospectively not a good idea anyway!

@res0nat0r If you confirm now that it brings back the correct count we can close this issue. (I tested it on my side)

(At the moment it is slower than the aws-cli - on this particular API endpoint only given a big count of s3objects. I am going to look if we can improve on that).

res0nat0r commented 6 years ago

@simcap After doing a go get -u it looks good. Thanks!

Also listing my ~50k objects seems pretty comparable to my aws s3 ls output below FYI.

$ time aws s3 ls --recursive s3://my-bucket-here | wc -l
49797

real    0m18.126s
user    0m9.561s
sys     0m0.753s

$ time awless -r us-east-2 list s3objects --format tsv --filter bucket=my-bucket-here | wc -l
49798

real    0m17.512s
user    0m15.231s
sys     0m0.580s
simcap commented 6 years ago

Awesome! Thanks. I will close this issue.

Any other problematic issue let us know, i will find the time to fix them before our release v0.1.10 coming up