Closed Joeturbot closed 2 years ago
I think this means we will need the queries themselves and the Turbot SDL (which is not yet publically available). The query itself prevents halting problems on the SDL. The SDL provides type inference which the query does not contain.
This is an example query
query ResourceControlPolicyQuery {
resources(paging: "", filter: "resourceTypeId:'tmod:@turbot/aws#/resource/types/aws'") {
items {
data
controls(filter: "state:alarm,error,ok") {
items {
reason
state
turbot {
dependencies {
policyValues {
items {
state
value
type {
modUri
description
categoryUri
input
}
}
}
}
}
}
}
}
metadata {
stats {
total
}
}
}
}
This is sample output from a query:
{
"data": {
"resources": {
"items": [
{
"data": {
"Id": "934217203932",
"Arn": "arn:aws:organizations::676944191433:account/o-c3a5y4wd52/934217203932",
"Name": "000 @ Turbot",
"Tags": [
{
"Key": "Name",
"Value": "Nathan Dev Env"
}
],
"Email": "aws+000@turbothq.com",
"Status": "ACTIVE",
"JoinedMethod": "INVITED",
"JoinedTimestamp": "2014-10-29T16:10:54.000Z"
},
"controls": {
...
}
etc
The goal would be to configure Steampipe to send that query to Turbot and present the result as one (or more) tables?
Correct. My hope is that GraphQL Query + Turbot SDL = Steampipe table with proper type definitions on each column. Altering the queries and relaunching Steampipe would "recompile" the table definition to match results of the query.
Dunno what SDL looks like, or means in this context, but anyway I'm still trying to understand the client- vs server-side filtering issue.
Today in the course of testing AWS auth in Steampipe Cloud I used the CloudFormation method to provision a role for cross-account trust, and have been poking around in turbot.com and steampipe-plugin-turbot to see what's what.
Here's a GraphQL query that immediately finds the role I just created.
query Resource {
resource(id: "arn:aws:iam::899206412154:role/steampipe_cloud") {
data
metadata
trunk {
title
}
turbot {
akas
id
tags
}
}
}
{
"data": {
"resource": {
"data": {
"Arn": "arn:aws:iam::899206412154:role/steampipe_cloud",
"Path": "/",
"Tags": [
{
"Key": "vaec:CKID",
"Value": "abc125"
},
{
"Key": "vaec:VAECID",
"Value": "xyz456"
}
],
"RoleId": "AROA5CXHA655BRJNCS2SR",
"Policies": [],
"RoleName": "steampipe_cloud",
"CreateDate": "2021-12-28T18:14:59.000Z",
"Description": "",
"RoleLastUsed": {},
"AttachedPolicies": [
{
"PolicyArn": "arn:aws:iam::aws:policy/ReadOnlyAccess",
"PolicyName": "ReadOnlyAccess"
}
],
"MaxSessionDuration": 28800,
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Effect": "Allow",
"Condition": {
"StringEquals": {
"sts:ExternalId": "u_c6fcejhe4mvehj43v7a0:vv8hdzr9"
}
},
"Principal": {
"AWS": "arn:aws:iam::316881668097:root"
}
}
]
}
},
"metadata": {
"aws": {
"accountId": "899206412154",
"partition": "aws"
},
"createTimestamp": "2021-12-28T18:14:59.000Z"
},
"trunk": {
"title": "Turbot > Dunder Mifflin > AWS > 899206412154 > steampipe_cloud"
},
"turbot": {
"akas": [
"arn:aws:iam::899206412154:role/steampipe_cloud"
],
"id": "244412933699404",
"tags": {
"vaec:CKID": "abc125",
"vaec:VAECID": "xyz456"
}
}
}
}
}
Here's an analogous steampipe query that finds it very slowly.
select
*
from
turbot_resource
where
data ->> 'Arn' = 'arn:aws:iam::899206412154:role/steampipe_cloud'
And here's one that finds it immediately.
select
*
from
turbot_resource
where
resource_type_uri = 'tmod:@turbot/aws-iam#/resource/types/role'
and data ->> 'Arn' = 'arn:aws:iam::899206412154:role/steampipe_cloud'
> .inspect turbot_resource
+-------------------+--------------------------+--------------------------------------------------
| column | type | description
+-------------------+--------------------------+--------------------------------------------------
| akas | jsonb | AKA (also known as) identifiers for the resource.
| create_timestamp | timestamp with time zone | When the resource was first discovered by Turbot.
| data | jsonb | Resource data.
| filter | text | Filter used for this resource list.
| id | bigint | Unique identifier of the resource.
| metadata | jsonb | Resource custom metadata.
| parent_id | bigint | ID for the parent of this resource. For the Turbo
| path | jsonb | Hierarchy path with all identifiers of ancestors
| resource_type_id | bigint | ID of the resource type for this resource.
| resource_type_uri | text | URI of the resource type for this resource.
| tags | jsonb | Tags for the resource.
| timestamp | timestamp with time zone | Timestamp when the resource was last modified (cr
| title | text | Title of the resource.
| trunk_title | text | Title with full path of the resource.
| update_timestamp | timestamp with time zone | When the resource was last updated in Turbot.
| version_id | bigint | Unique identifier for this version of the resourc
| workspace | text | Specifies the workspace URL.
+-------------------+--------------------------+--------------------------------------------------
> .output json
[
{
"akas": [
"arn:aws:iam::899206412154:role/steampipe_cloud"
],
"create_timestamp": "2021-12-28T10:15:05-08:00",
"data": {
"Arn": "arn:aws:iam::899206412154:role/steampipe_cloud",
"AssumeRolePolicyDocument": {
"Statement": [
{
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "u_c6fcejhe4mvehj43v7a0:vv8hdzr9"
}
},
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::316881668097:root"
}
}
],
"Version": "2012-10-17"
},
"AttachedPolicies": [
{
"PolicyArn": "arn:aws:iam::aws:policy/ReadOnlyAccess",
"PolicyName": "ReadOnlyAccess"
}
],
"CreateDate": "2021-12-28T18:14:59.000Z",
"Description": "",
"MaxSessionDuration": 28800,
"Path": "/",
"Policies": [],
"RoleId": "AROA5CXHA655BRJNCS2SR",
"RoleLastUsed": {},
"RoleName": "steampipe_cloud",
"Tags": [
{
"Key": "vaec:CKID",
"Value": "abc125"
},
{
"Key": "vaec:VAECID",
"Value": "xyz456"
}
]
},
"filter": null,
"id": 244412933699404,
"metadata": {
"aws": {
"accountId": "899206412154",
"partition": "aws"
},
"createTimestamp": "2021-12-28T18:14:59.000Z"
},
"parent_id": 208149752063137,
"path": [
162618828978177,
197293160400056,
197293169049322,
208149752063137,
244412933699404
],
"resource_type_id": 162661932236396,
"resource_type_uri": "tmod:@turbot/aws-iam#/resource/types/role",
"tags": {
"vaec:CKID": "abc125",
"vaec:VAECID": "xyz456"
},
"timestamp": "2021-12-28T10:15:05-08:00",
"title": "steampipe_cloud",
"trunk_title": "Turbot > Dunder Mifflin > AWS > 899206412154 > steampipe_cloud",
"update_timestamp": "2021-12-28T10:41:53-08:00",
"version_id": 244414580090703,
"workspace": "https://demo-turbot.cloud.turbot-dev.com"
}
]
Is this an example of a query that's a no-brainer in graphql but requires a bit more know-how to run efficiently using steampipe-plugin-turbot?
How/why would the schema for a table derived from the graphql output differ from the one defined by the plugin?
Is this an example of a query that's a no-brainer in graphql but requires a bit more know-how to run efficiently using steampipe-plugin-turbot? I think it might be. Here's the underlying GraphQL query for your slow query:
Slow Query
select * from turbot_resource where data ->> 'Arn' = 'arn:aws:iam::899206412154:role/steampipe_cloud'
the underlying GraphQL query
query SteampipeSide { resources(filter: "") { metadata { stats { total } } } }
In Demo, this returned
32878
results.
select
*
from
turbot_resource
where
resource_type_uri = 'tmod:@turbot/aws-iam#/resource/types/role'
and data ->> 'Arn' = 'arn:aws:iam::899206412154:role/steampipe_cloud'
Underlying GraphQL query:
query TurbotSide{
resources(filter:"resourceTypeId:tmod:@turbot/aws-iam#/resource/types/role") {
metadata{
stats{
total
}
}
}
}
In Demo, this returns a mere 867
"rows". This is only 2% of the total resources. No surprise to me that the quick query is well, quick.
How/why would the schema for a table derived from the graphql output differ from the one defined by the plugin?
By "...the one defined by the plugin", I believe you mean the default turbot_*
tables that are already implemented. I'm going to refer to these as the "static tables" hereafter. For simple queries that only query one of resources
, policy_settings
, or controls
, I don't expect the difference between the dynamic and static tables to be that big. However, for a query that hits two or more, then the Steampipe table schema will be very different. The resource_controls_policies query is a great example.
query ResourceControlPolicyQuery {
resources(paging: "", filter: "resourceTypeId:'tmod:@turbot/aws#/resource/types/aws'") {
items {
data
controls(filter: "state:alarm,error,ok") {
items {
reason
state
turbot {
dependencies {
policyValues {
items {
state
value
type {
modUri
description
categoryUri
input
}
}
}
}
}
}
}
}
metadata {
stats {
total
}
}
}
}
}}}}
An incomplete and rough DDL for this query might look like
create table resources_controls_policies (
resource_data json,
control_reason string,
control_state string,
policyvalues_state string,
policyvalues_value string,
policyvalues_state string,
policyvalues_type json,
metadata_stats json
)
A rough (and not executable) Steampipe query to get the same thing using only static tables would look like:
with resources as (
select id as resource_id, data as resource_data
from turbot_resource
),
controls as (
select id as control_id, reason as control_reason, state as control_state, resource_id
from turbot_control
),
policy_values as (
--the policy_values table doesn't exist yet so these values are pure speculation
select id as policy_id, state as policy_state, value as policy_value, control_id
from turbot_policy_values
)
select
r.resource_data,
c.control_state,
p.policy_value
from resources r
left join controls c on r.resource_id = c.resource_id
left join policy_values as p on p.control_id = c.control_id
where filter = 'resourceTypeId:"tmod:@turbot/aws-iam#/resource/types/role"'
Improved steampipe query:
with resources as (
select id as resource_id, data as resource_data
from turbot_resource
where filter = 'resourceTypeId:"tmod:@turbot/aws-iam#/resource/types/role"'
),
controls as (
select id as control_id, reason as control_reason, state as control_state, resource_id
from turbot_control
where filter = 'resourceTypeId:"tmod:@turbot/aws-iam#/resource/types/role" state:alarm'
),
policy_values as (
--the policy_values table doesn't exist yet so these values are pure speculation
select id as policy_id, state as policy_state, value as policy_value, control_id
from turbot_policy_values
)
select
r.resource_data,
c.control_state,
p.policy_value
from resources r
left join controls c on r.resource_id = c.resource_id
left join policy_values as p on p.control_id = c.control_id
This still suffers (some) from the "Transfer the World" problem as Steampipe has to get all the IAM roles and all the controls relevant to IAM roles then the policy values. Although, as I think about it, if someone filtered for all AWS resources, then you still have transfer the world. The only difference would be the relative performance of your local workstation vs the Turbot DB. The Turbot DB is almost always going to be far more powerful/faster than your local workstation at running queries.
Closing. This is a cool idea but outside the scope of development.
Is your feature request related to a problem? Please describe. The existing
turbot_*
tables are insufficient to cover the breadth of the Turbot API. Further, queries like resource_control_policy.graphql are easy to do in raw GraphQL but painful to do in Steampipe.There is also considerable difficulty for users new to Turbot and to Steampipe with grokking where the filtering happens, whether Turbot-side or Steampipe-side. This difficulty can lead to long running queries when a user specifies Steampipe-side filtering then unintentionally pulls over hundreds of thousands of rows from Turbot. I believe/hope that if the user specifies the GraphQL themselves, it will be clear to them where the filtering/joining is happening.
Describe the solution you'd like Specify a path to a file or directory on the local file system that contains GraphQL files. On launch, Steampipe will parse these query files then autogenerate tables and columns to match these queries. The overal operation would be similar to how dynamic tables are created for the
CSV
,Terraform
andGoogleSheets
plugins.Describe alternatives you've considered There aren't any really, other than to write the GraphQL queries into a general purpose programming language.
Additional context The Turbot GraphQL API is incredibly rich. The current approach to statically defined tables removes some of that richness and imposes additional development load on Turbot plugin developers to implement each new table.