Closed daveagill closed 1 year ago
Hey @daveagill ... it would be helpful to try and narrow down the source of the crash. Two dimensions to consider:
From the column point of view, can you please try:
select name from aws_s3_bucket
Then:
select name, region from aws_s3_bucket
And then keep adding columns progressively?
Hi @e-gineer thanks for your reply.
In trying to pinpoint a column it seems the issue becomes intermittent with a success/fail crossover at around this point:
steampipe query "select _ctx, policy_std, replication, website_configuration, tags_src, tags, akas, creation_date, bucket_policy_is_public, versioning_enabled, versioning_mfa_delete, block_public_acls, block_public_policy, ignore_public_acls, restrict_public_buckets from aws_s3_bucket"
i.e. it's around those last two columns: ignore_public_acls, restrict_public_buckets
If I query for everything up to and including ignore_public_acls then it will generally succeed, but occasionally fail. If I query for everything up to and including restrict_public_buckets then it will generally fail, but occasionally succeed.
At this point it feels like the problem is some kind of non-determinism / race condition / multi-threading bug. Or perhaps the occasional passes/fails are the result of caching effects.
It isn't the fault of those columns in isolation though, for example if I execute this then it will run quite happily:
select restrict_public_buckets from aws_s3_bucket
Any ideas on how to debug further? Maybe some other diagnostics I can pull out? Like a stacktrace or debug mode.
Cheers
@daveagill Thanks for sharing this info and your interest in helping to debug.
I could not reproduce this with a small set of buckets (around 120); this may be too small for data set.
Curious to reproduce this particular scenario; can you provide a few more inputs, such as
race condition
, wondering if you are executing this as a single query or part of any other queries or a mod?Thank you!
@rajlearner17 I have tried a few more things but no breakthroughs yet.
Here are answers to your questions:
export AWS_PROFILE=NameOfMyProfile
aws sso login
steampipe query "select count(*) from aws_s3_bucket"
That AWS_PROFILE
is a standard env-var recognised by the AWS CLI and steampipe seems happy with it too.
There are 101 buckets according to: select count(*) from aws_s3_bucket
Ah when I referred to a race condition I was actually referring to the internal workings of the steampipe aws plugin itself. I have observed that query results seem to arrive back in a different order each time. And even with the select *
query that fails it still returns a handful (up to 10 or so) S3 buckets but it's not always the same S3 buckets that come back. So it seems to be me there must be some async or concurrency going on internally.
As for how I am executing these queries I am just running very simple commands by hand in my terminal and hitting enter: steampipe query "select * from aws_s3_bucket"
My aws.spc file is just the default one except I've uncommented the ignore_error_codes
line because we have a number of S3 buckets that produce AccessDenied for calls like GetBucketLocation.
I can't think of a way to narrow down whether it is the fault of a single S3 bucket. The problem I have is if I issue a query like select * from aws_s3_bucket where name='something'
then I still receive that error anyway regardless of the name. Which is interesting but I don't know what that tells us.
I don't suppose there's a verbose mode or a crash dump somewhere that could help?
Thanks @daveagill for the detailed information.
To get more, you can enable logging:
STEAMPIPE_LOG=trace steampipe query "select _ctx, policy_std, replication, website_configuration, tags_src, tags, akas, creation_date, bucket_policy_is_public, versioning_enabled, versioning_mfa_delete, block_public_acls, block_public_policy, ignore_public_acls, restrict_public_buckets from aws_s3_bucket"
The logs will be in ~/.steampipe/logs/....
.
That may provide some more insight into any crashes.
I'll note that seeing different order of results and data is normal. In this case, Steampipe is paging through the list of S3 buckets. Then, for each bucket, it's doing a series of sub-API calls to get column data. It's a lot of API calls, and they all return at different times.
As the data for each row is complete, Steampipe will stream the finished row back to the result set, which means they arrive in different order each time.
If you use order by
though, Postgres will order the rows at the end of the query as expected. (Without order by, the order of the rows is undefined in SQL.)
If you want to understand the nitty gritty, I describe this process in more detail at https://youtu.be/2BNzIU5SFaw?t=640
Hey @daveagill, We are closing this issue because we have not heard from you. Please feel free to reopen the issue if you want to share or discuss anything.
Describe the bug This query appears to crash the AWS plugin:
select * from aws_s3_bucket
resulting in this error:
Error: rpc error: code = Unavailable desc = error reading from server: EOF (SQLSTATE HV000)
Steampipe version (
steampipe -v
) Steampipe v0.19.5Plugin version (
steampipe plugin list
) +--------------------------------------------+---------+-------------+ | Installed Plugin | Version | Connections | +--------------------------------------------+---------+-------------+ | hub.steampipe.io/plugins/turbot/aws@latest | 0.102.0 | aws | +--------------------------------------------+---------+-------------+To reproduce 1) I have a bunch of AWS S3 buckets:
2) I have uncommented in aws.spc:
ignore_error_codes = ["AccessDenied", "AccessDeniedException", "NotAuthorized", "UnauthorizedOperation", "UnrecognizedClientException", "AuthorizationError"]
3) Issue this query:
select * from aws_s3_bucket
4) Some results will be returned (it varies how many) and the following error shows:
Error: rpc error: code = Unavailable desc = error reading from server: EOF (SQLSTATE HV000)
See logs below.
Expected behavior Should fully detail all S3 buckets.
Additional context Other similar queries can run successfully:
select name from aws_s3_bucket
works everytime.select count(*) from aws_s3_bucket
works and returns 101select * from aws_s3_bucket limit 52
works more often than not but as you increase that limit the failure becomes more likely. It may be there is a problematic S3 bucket causing the problem but not sure offhand how to narrow that down.logs/plugin-2023-05-16.log
logs/database-2023-05-16.log