turbot / steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
https://steampipe.io
GNU Affero General Public License v3.0
6.92k stars 274 forks source link

Connection errors in aggregate connections causes empty results #1848

Closed jbertman closed 10 months ago

jbertman commented 2 years ago

Describe the bug When an aggregate connection contains at least one connection that errors, no results are populated despite checks being executed

Steampipe version (steampipe -v) v0.13.5

To reproduce My particular use case that causes connection errors uses the AssumeRole functionality of the AWS plugin. Set up AWS connections and an aggregator connection:

connection "aws_all" {
  type        = "aggregator"
  plugin      = "aws"
  connections = ["aws_*"]
}
connection "aws_account_1" {
  plugin  = "aws"
  profile = "account_1"
  regions = ["*"]
}
connection "aws_account_2" {
  plugin  = "aws"
  profile = "account_2"
  regions = ["*"]
}
connection "aws_account_3" {
  plugin  = "aws"
  profile = "account_3"
  regions = ["*"]
}

Where our credentials (~/.aws/credentials) file contains:

[account_1]
credential_source = Environment
role_arn = arn:aws:iam::<account_1_id>:role/OrganizationAccountAccessRole

[account_2]
credential_source = Environment
role_arn = arn:aws:iam::<account_2_id>:role/OrganizationAccountAccessRole

[account_3]
credential_source = Environment
role_arn = arn:aws:iam::<account_3_id>:role/OrganizationAccountAccessRole

This setup allows me to use something like aws-vault (that sets environment variables) to assume into other accounts: aws-vault exec some_master_account --no-session -- steampipe check benchmark.cis_v140 --workspace-chdir steampipe-mod-aws-compliance --search-path-prefix aws_all --export csv,json

When all accounts are "assumable", everything works fine. But when any of them contain AssumeRole errors, then I see something like this in the output:

    ERROR: 1 connection failed: 
connection 'aws_account_3': rpc error: code = Unknown desc = AccessDenied: User: arn:aws:iam::<master_account_id>:user/some_user is not authorized to perform: sts:AssumeRole on resource…
    status code: 403, request id: <...> (SQLSTATE HV000)

However, work is clearly being done on other accounts in the background (it takes just as long), so this appears to be a reporting error.

Expected behavior All successful checks should be reported, even if errors occur on some of the connections.

Additional context

kaidaguerre commented 2 years ago

Thanks for the report @jbertman

I'll investigate what's going on here

jbertman commented 2 years ago

@kaidaguerre Any issues with repro? If you want to point me in the right direction I can try to debug.

jbertman commented 2 years ago

Thanks for the report @jbertman

I'll investigate what's going on here

@kaidaguerre any particular place I can start looking? I see that there are possibly a few related issues such as https://github.com/turbot/steampipe-plugin-aws/issues/878

jbertman commented 2 years ago

I see there's some additional work here: https://github.com/turbot/steampipe/pull/1886

Perhaps this closes the issue, I'll give it a test run.

kaidaguerre commented 2 years ago

@jbertman no I do not believe that fix will help with this issue - that one was to change the behaviour of the check and dashboard command if any connection failed to load (for example if the csv plugin was missing required connection config)

Previously, check and dashboard command would fail if any connection failed, whereas the query command would run with a warning. Now, check and dashboard are consistent with query - i.e. they also run with a warning if any connections fail.

(feel free to correct me if, happily, I am wrong and this does fix it)

kaidaguerre commented 2 years ago

I haven't dug into this yet, but my guess is that the problem lies in the GroupInterator in the FDW.

This is how we implement aggregate connections - a GroupInterator has a collection of underlying Iterators, one per underlying Steampipe connection.

My initial guess would be the error handling in here is not quite right and that if one connection fails, no results are reported even from successful connections.

The error message ERROR: 1 connection failed: comes from here

kaidaguerre commented 2 years ago

I hope to have time to dig into this in the next day or so

jbertman commented 2 years ago

@kaidaguerre thanks for the pointer! It's a bit hard to tell from the initial looking, but it seems like the error is on the control level. If I take a particular permission away (so one check succeeds and another fails based on permissions), I get results for the all-success but not the other.

so with the following permissions:

├── master-account
│   ├── aws-account-1
         └── cloudfront (allow)
         └── cloudtrail (allow)
    ├── aws-account-2
        └── cloudfront (allow)
        └── cloudtrail (disallow)

If I run a thrifty check all with creds from master-account as the connection aggregator (w/ assume-role), I get results from cloudfront (allowed in both), but not cloudtrail (allowed in one). Maybe helpful while narrowing down the root cause :)

jbertman commented 2 years ago

an addition to the above: I get some results if role assumption succeeds, but the resource fetch fails. If role-assumption fails for one connection, it fails for the whole resource (even if it does the work in the background).

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] commented 12 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] commented 10 months ago

This issue was closed because it has been stalled for 90 days with no activity.