Open shaicoleman opened 6 months ago
Hi @shaicoleman, the aws_cost_by_service_daily
table provides you with a simplified view of cost for services in your account (or all linked accounts when run against the organization master), summarized by day, for the last year.
It's crafted to deliver cost and usage insights per service on a daily basis for the previous year.
From what I gather, you're looking to obtain cost and usage details over a specific time frame.
Regrettably, the aws_cost_usage_*
tables lack the capability to filter by a specified time period. Enhancing the aws_cost_usage
table or introducing a new table could be potential improvements to support this feature.
I'm currently addressing this issue and will keep you posted once the table design is finalized.
Thank you!
Hello @shaicoleman, I've made some updates in the issue-2149
branch for the aws_cost_usage
table:
search_start_time
and search_end_time
columns in the WHERE clause, allowing you to retrieve results within a particular period.granularity
specified in the WHERE clause.Query result:
> select
period_start,
period_end,
dimension_1 as account_id,
dimension_2 as service_name,
net_unblended_cost_amount::numeric::money
from
aws_cost_usage
where
granularity = 'MONTHLY'
and dimension_type_1 = 'LINKED_ACCOUNT'
and dimension_type_2 = 'SERVICE'
and search_start_time = '2023-04-01T05:30:00+05:30'
and search_end_time = '2023-04-05T05:30:00+05:30'
order by
dimension_1,
period_start;
+---------------------------+---------------------------+--------------+----------------------------------------+---------------------------+
| period_start | period_end | account_id | service_name | net_unblended_cost_amount |
+---------------------------+---------------------------+--------------+----------------------------------------+---------------------------+
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Tax | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Simple Storage Service | $1.84 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Elastic Compute Cloud - Compute | $2.23 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Glacier | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Security Hub | $0.01 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS CloudTrail | $1.30 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Cost Explorer | $0.45 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Amplify | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Simple Queue Service | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Service Catalog | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Inspector | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Simple Workflow Service | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Step Functions | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon GuardDuty | $0.02 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Simple Notification Service | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Glue | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS WAF | $1.47 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Config | $0.39 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Security Lake | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Secrets Manager | $0.05 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Key Management Service | $0.94 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AmazonCloudWatch | $0.23 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | EC2 - Other | $0.53 |
+---------------------------+---------------------------+--------------+----------------------------------------+---------------------------+
Time: 7.3s. Rows fetched: 23. Hydrate calls: 23.
We'd greatly appreciate it if you would like to test the changes in the PR branch and share your feedback with us to ensure the code changes meet the requirements.
Here are the steps to test the PR branch:
git clone https://github.com/turbot/steampipe-plugin-aws.git
cd steampipe-plugin-aws
git checkout issue-2149
make
Thank you!
I'm having issues testing that branch with the instructions above:
$ steampipe -v
Steampipe v0.22.1
$ go version
go version go1.22.1 linux/amd64
$ rm -rf ~/.steampipe
$ git clone https://github.com/turbot/steampipe-plugin-aws.git
$ cd steampipe-plugin-aws
$ git checkout issue-2149
$ git show --summary
commit b2d235cb035cf1ca5fba0a018c06920bf19b2bff (HEAD -> issue-2149, origin/issue-2149)
Enhanced the table aws_cost_usage to accept custom time frame as input param Closes #2149
$ make
go build -o ~/.steampipe/plugins/hub.steampipe.io/plugins/turbot/aws@latest/steampipe-plugin-aws.plugin -tags "" *.go
$ steampipe query
> .inspect aws
Error: could not find connection or table called 'aws'. Is the plugin installed? Is the connection configured?
$ cp config/* ~/.steampipe/config
$ vi ~/.steampipe/config/aws.spc # no changes
$ steampipe query
> .inspect aws
+------------+--------------------------------------------+-------------+-------+----------------------+----------------------+
| connection | plugin | schema mode | state | error | state updated |
+------------+--------------------------------------------+-------------+-------+----------------------+----------------------+
| aws | hub.steampipe.io/plugins/turbot/aws@latest | | error | plugin not installed | 2024-03-29T17:40:03Z |
+------------+--------------------------------------------+-------------+-------+----------------------+----------------------+
Hey, @shaicoleman, could you please try running the command steampipe plugin install aws
first and then run the make
command from the dir ~/steampipe-plugin-aws
?
I can't get it to run:
aws_cost_by_service_daily
doesn't have a search_start_time
field. The date filters should be applicable to all relevant tables.
SELECT service, unblended_cost_amount, period_start, period_end
FROM aws_cost_by_service_daily
WHERE service = 'Tax' AND
search_start_time >= current_timestamp - interval '30d'
Error: column "search_start_time" does not exist (SQLSTATE 42703)
Doesn't work with aws_cost_usage
either:
SELECT *
FROM aws_cost_usage
WHERE search_start_time >= curerent_timestamp - interval '30d'
No results - no network activity - query doesn't seem to run
I tried also with a date, but still doesn't work
SELECT *
FROM aws_cost_usage
WHERE search_start_time >= '2024-03-01'
No results - no network activity - query doesn't seem to run
Hi @shaicoleman, we haven't yet implemented support for the search_start_time
and search_end_time
columns across all aws_cost_*
tables. Currently, these changes have been applied to the aws_cost_usage
table only.
I've made additional updates in the issue-2149
branch. Could you please fetch these updates to your local setup, rebuild the plugin, and run your queries again?
Note: When querying the aws_cost_usage
table, it's mandatory to include granularity
, dimension_type_1
, and dimension_type_2
in the WHERE clause. You may also optionally include search_start_time
or search_end_time
. For further details, kindly refer to the Important Notes section in the table documentation.
Here are the query results with different operators(=
, >=
, <=
):
> select
period_start,
period_end,
search_start_time,
dimension_1 as account_id,
dimension_2 as service_name,
net_unblended_cost_amount::numeric::money
from
aws_cost_usage
where
granularity = 'MONTHLY'
and dimension_type_1 = 'LINKED_ACCOUNT'
and dimension_type_2 = 'SERVICE'
and search_start_time >= '2023-08-01'
+---------------------------+---------------------------+---------------------------+--------------+------------------------------------------------+---------------------------+
| period_start | period_end | search_start_time | account_id | service_name | net_unblended_cost_amount |
+---------------------------+---------------------------+---------------------------+--------------+------------------------------------------------+---------------------------+
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Config | $3.03 |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Security Hub | $0.06 |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Key Management Service | $2.87 |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Cost Explorer | $2.69 |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Cloud Map | $0.20 |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Secrets Manager | $0.40 |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Step Functions | $0.00 |
select
period_start,
period_end,
dimension_1 as account_id,
dimension_2 as service_name,
net_unblended_cost_amount::numeric::money
from
aws_cost_usage
where
granularity = 'MONTHLY'
and dimension_type_1 = 'LINKED_ACCOUNT'
and dimension_type_2 = 'SERVICE'
and search_start_time = '2023-04-01T05:30:00+05:30'
and search_end_time = '2023-04-05T05:30:00+05:30'
+---------------------------+---------------------------+--------------+----------------------------------------+---------------------------+
| period_start | period_end | account_id | service_name | net_unblended_cost_amount |
+---------------------------+---------------------------+--------------+----------------------------------------+---------------------------+
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 333333333333 | Tax | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 333333333333 | AWS Config | $0.39 |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 333333333333 | AWS Amplify | $0.00 |
select
period_start,
period_end,
search_start_time,
dimension_1 as account_id,
dimension_2 as service_name,
net_unblended_cost_amount::numeric::money
from
aws_cost_usage
where
granularity = 'MONTHLY'
and dimension_type_1 = 'LINKED_ACCOUNT'
and dimension_type_2 = 'SERVICE'
and search_start_time <= '2023-08-01'
+---------------------------+---------------------------+---------------------------+--------------+------------------------------------------------+---------------------------+
| period_start | period_end | search_start_time | account_id | service_name | net_unblended_cost_amount |
+---------------------------+---------------------------+---------------------------+--------------+------------------------------------------------+---------------------------+
| 2023-04-01T05:30:00+05:30 | 2023-05-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS CloudShell | $0.00 |
| 2023-04-01T05:30:00+05:30 | 2023-05-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Amplify | $0.00 |
| 2023-05-01T05:30:00+05:30 | 2023-06-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Config | $5.47 |
| 2023-04-01T05:30:00+05:30 | 2023-05-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS CloudTrail | $10.88 |
| 2023-05-01T05:30:00+05:30 | 2023-06-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS CloudTrail | $11.35 |
| 2023-05-01T05:30:00+05:30 | 2023-06-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Security Hub | $0.06 |
Please feel free to share your feedback.
Thanks!
It seems that there's an issue returning empty cached results when repeating the same query and changing the date filters.
e.g. the first query will return results correctly:
select
period_start,
period_end,
search_start_time,
dimension_1 as account_id,
dimension_2 as service_name,
net_unblended_cost_amount::numeric::money
from
aws_cost_usage
where
granularity = 'MONTHLY'
and dimension_type_1 = 'LINKED_ACCOUNT'
and dimension_type_2 = 'SERVICE'
and search_start_time >= '2023-08-01'
But then repeating the same query with a different date will return zero resuts and will not actually execute the query.
select
period_start,
period_end,
search_start_time,
dimension_1 as account_id,
dimension_2 as service_name,
net_unblended_cost_amount::numeric::money
from
aws_cost_usage
where
granularity = 'MONTHLY'
and dimension_type_1 = 'LINKED_ACCOUNT'
and dimension_type_2 = 'SERVICE'
and search_start_time >= '2023-09-01'
+--------------+------------+-------------------+------------+--------------+---------------------------+
| period_start | period_end | search_start_time | account_id | service_name | net_unblended_cost_amount |
+--------------+------------+-------------------+------------+--------------+---------------------------+
+--------------+------------+-------------------+------------+--------------+---------------------------+
And also would be good to get that functionality into the other tables as well
Hi @shaicoleman, thank you for testing the changes.
Regarding the issue where repeating the same query with a different date returns zero results, I have pushed a fix to the same branch (issue-2149
). I hope this resolves the problem.
Additionally, I've expanded the functionality to include search_start_time
and search_end_time
columns for all aws_cost_*
tables in the issue-2149
branch.
Could you please pull the latest changes from the issue-2149
branch to your local setup and build the plugin locally?
Let me know if you encounter any further issues.
Thanks!
Hi,
It doesn't filter correctly the following query:
SELECT service, unblended_cost_amount, period_start, period_end
FROM aws_cost_by_service_daily
WHERE search_start_time >= current_timestamp - interval '30d' AND service = 'Tax'
It requests the default 1 year period, e.g.
{"TimePeriod":{"End":"2024-04-24","Start":"2023-04-24"}}
It also requests unnecessary metrics which aren't part of the query:
{"Metrics":["BlendedCost","UnblendedCost","NetUnblendedCost","AmortizedCost","NetAmortizedCost","UsageQuantity","NormalizedUsageAmount"]}
It would be good if it would only request the necessary metrics.
Also, consider making the search_start_time
a required field. Although it's a breaking change, I think it makes sense in this case as it currently chooses an unintuitive default, which generates a lot of extra queries and can incur additional costs.
@shaicoleman, thank you for your insightful feedback. I've made the following updates to the branch issue-2149
:
It doesn't filter correctly the following query
Apologies, that was my oversight. I neglected to push a commit from my local machine. Now, all tables should be able to accept a custom time range based on the query parameter.
It would be good if it would only request the necessary metrics.
Excellent suggestion! I've implemented it, and the results are impressive. I've added an optional string type column called metrics
that can be used in the WHERE clause. For selecting a single metric, use ... where metrics = 'BlendedCost'
. For multiple metrics, it should be ... where metrics = 'BlendedCost,AmortizedCost,NetUnblendedCost'
(listed without spaces and separated by commas).
For example, the query select * from aws_cost_by_service_usage_type_daily where metrics = 'BlendedCost'
(for a single metric) took 94.1s
to complete, while the query select * from aws_cost_by_service_usage_type_daily
(for all metrics) took 535.7s
.
I would appreciate it if you could pull the latest changes from the issue-2149
branch, build it locally, and share your feedback.
Note: The metrics
optional quals support has not been added to the tables aws_cost_forecast_daily
and aws_cost_forecast_monthly
, as these tables are designed to return statistics for the UNBLENDED_COST
metric only.
Thanks again for your valuable input!
Looks good to me! Thanks
Hello, @shaicoleman, just to let you know, we've finalized the table design and implementation, and I've pushed the updates to the branch issue-2149
.
Changes details:
search_start_time
and search_end_time
columns, as their functionality is effectively replicated by the period_start
and period_end
columns.metrics
column. Now, the table is designed to dynamically pass metric values based on the columns specified in the select statement:
select * from aws_cost_by_account_monthly
will internally call the API passing all supported metric values.select period_end, blended_cost_amount from aws_cost_by_account_monthly
will trigger an API call for only the BlendedCost
metric.Query performance outcomes:
select * from aws_cost_by_account_monthly (12.6s)
select period_end, blended_cost_amount from aws_cost_by_account_monthly (5.6s)
It would be great if you could pull the latest changes from the branch issue-2149
and share your observations or feedback.
Thanks!
It doesn't seem to calculate correctly the TimePeriod
, e.g. for the following query:
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'DAILY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
period_start >= current_timestamp - interval '32d' AND
period_end BETWEEN current_timestamp - interval '31d' AND current_timestamp - interval '1d' AND
unblended_cost_amount <> 0
It sends the following {"TimePeriod":{"End":"2024-04-08","Start":" 2024-05-08"}}
which generates an error: response error StatusCode: 400, RequestID: 8fe1f2ed-3ea1-49c4-8f95-8c6c3be05df6, api error ValidationException: Start date (and hour) should be before end date (and hour)
The error message suggests it might accept a timestamp, but I haven't tried it, and the documentation says otherwise.
Also note: The start date is inclusive, but the end date is exclusive. https://docs.aws.amazon.com/aws-cost-management/latest/APIReference/API_GetCostAndUsage.html#awscostmanagement-GetCostAndUsage-request-TimePeriod The dates are in UTC timezone.
Hi @shaicoleman,
Apologies for the late response. I have pushed another change to the issue-2149
branch to handle the edge case mentioned above.
I have tested the following query parameter combinations.
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'DAILY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
period_end >= current_timestamp - interval '32d' AND
unblended_cost_amount <> 0
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'DAILY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
period_end >= current_timestamp - interval '32d' AND
unblended_cost_amount <> 0
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'DAILY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
period_start >= current_timestamp - interval '32d' AND
period_end <= current_timestamp - interval '25d' AND
unblended_cost_amount <> 0
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'DAILY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
period_start <= current_timestamp - interval '32d' AND
period_end >= current_timestamp - interval '35d' AND
unblended_cost_amount <> 0
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'DAILY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
period_start <= current_timestamp - interval '32d' AND
unblended_cost_amount <> 0
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'DAILY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
period_start >= current_timestamp - interval '32d' AND
unblended_cost_amount <> 0
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'DAILY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
period_start >= current_timestamp - interval '32d' AND
period_end BETWEEN current_timestamp - interval '31d' AND current_timestamp - interval '1d' AND
unblended_cost_amount <> 0
Could you please give it a shot again by pulling the latest changes and let me know if I missed any edge cases?
Also note: The start date is inclusive, but the end date is exclusive.
Yes, we are passing the input as it is given in the query parameter, and the results are being displayed based on query parameter conditions. we are not modifying any of the rows.
Could you please expand more, particularly for which query you see the deviation? Also, Do the relevant AWS CLI commands return the result as expected?
Thanks
@ParthaI , This works for most cases (including my usecase), but it fails with some edge cases.
e.g. this will fall back to the default one year period, thus missing some expected results.
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'MONTHLY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
(period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND
unblended_cost_amount <> 0
The ideal solution would be to handle the common cases automatically, but have fields for a manual override as before. If it can't be detected automatically, and no manual dates were provided, it should give an error rather than falling back to an arbitary date range.
I'm not sure it will always be possible to correctly detect if it is indeed a simple case or not (OR conditions, CTEs, unions, subqueries, etc.)
Explicitly setting the fields as before is the most reliable way to do it.
Thanks @shaicoleman, For your feedback. I will retake a look.
One thing that would be nice to keep from the current solution is the ability to specify timestamps instead of just dates
Hello, @shaicoleman. I appreciate your suggestions. I have been comparing the query results from the main
branch with those from the issue-2149
branch. I did not notice any deviations in the results. Additionally, I observed significant improvements in query timing. Please take a look at my observations below.
Query:
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'MONTHLY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
(period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND
unblended_cost_amount <> 0
Plugin Environment:
main
branchissue-2149
branchThere is no difference in the results from either branch.
Branch main |
Branch issue-2149 |
---|---|
Query: SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end FROM aws_cost_usage WHERE granularity = 'MONTHLY' AND dimension_type_1 = 'SERVICE' AND dimension_type_2 = 'USAGE_TYPE' AND (period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND unblended_cost_amount <> 0 order by period_start, service, usage_type |
Query: SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end FROM aws_cost_usage WHERE granularity = 'MONTHLY' AND dimension_type_1 = 'SERVICE' AND dimension_type_2 = 'USAGE_TYPE' AND (period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND unblended_cost_amount <> 0 order by period_start, service, usage_type |
Number Of Rows return: 6687 |
Number Of Rows return: 6687 |
Hydrated calls: 6687 |
Hydrated calls: 6687 |
Query Timing: 25257ms |
Query Timing: 7013ms |
Also, I have validated the result as well, I did not see any deviation between the results.
main
branch plugin build ran the command: steampipe query "SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end FROM aws_cost_usage WHERE granularity = 'MONTHLY' AND dimension_type_1 = 'SERVICE' AND dimension_type_2 = 'USAGE_TYPE' AND (period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND unblended_cost_amount <> 0 order by period_start, service, usage_type" --output json > current_main_result.json
issue-2149
branch plugin build ran the command: steampipe query "SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end FROM aws_cost_usage WHERE granularity = 'MONTHLY' AND dimension_type_1 = 'SERVICE' AND dimension_type_2 = 'USAGE_TYPE' AND (period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND unblended_cost_amount <> 0 order by period_start, service, usage_type" --output json > branch_2149_result.json
current_main_result.json
and branch_2149_result.json
I did not notice any deviation between the results.Could you please let me know if you noticed any mismatch in your case?
The ideal solution would be to handle the common cases automatically, but have fields for a manual override as before. If it can't be detected automatically, and no manual dates were provided, it should give an error rather than falling back to an arbitary date range.
I believe it would be a breaking change for the user to give an error rather than fall back to an arbitrary date range. Instead, let the API return the results and perform the filtration at the Steampipe query level based on the query parameters (this approach looks good to me).
One thing that would be nice to keep from the current solution is the ability to specify timestamps instead of just dates
Great idea. According to our current code, if the granularity is set to HOURLY
, we use timestamps; otherwise, we use dates. The changes in this PR branch follow the same approach. However, handling timestamps in different formats can be challenging. We have made these changes while also considering backward compatibility.
Thank you!
Could you please let me know if you noticed any mismatch in your case?
I checked the date range that was sent to the API, if the date range is wrong, there is no way the results can be correct. It could be that you have less than 12 months of billing data (AWS retains 13 months), which explains why you're getting the same result. You may need to change the granuality to daily to see the differences.
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'MONTHLY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
(period_end >= current_timestamp - interval '1d' OR period_end >= current_timestamp - interval '2d') AND
unblended_cost_amount <> 0
The example above requests a whole year instead of 2 days. If the granuality is daily it can be very expensive, thus doesn't fix the issue it was supposed to solve in the first place.
I understand not wanting to break compatibility, but the current behaviour can cause silently excessive costs, silently truncated data and inconsistent behaviour. This justifies breaking compatibility for the small amount of users who are using this table. Giving a clear error that can be easily fixed is much better than the silent bugs above, which required spending a lot of my time and yours to figure out.
If breaking compatibility is not something you're willing to consider, then at least allow to optionally set the search_start_time
and search_end_time
to override the calculated value.
It could also print a warning to stderr (e.g. WARNING: future versions will require explicitly setting the search_start_time and search_end_time, defaulting to a year period)
Hi @shaicoleman,
I hope this message finds you well.
I have reviewed your requirements from a costing perspective and would like to address the following points:
OR
operator, while Steampipe FDW parses the query correctly, the quals value remains empty.If breaking compatibility is not something you're willing to consider, then at least allow to optionally set the search_start_time and search_end_time to override the calculated value.
period_start
and period_end
columns to support passing the time period. Adding search_start_time
and search_end_time
columns for a specific query requirement might not be the best approach due to redundancy.search_start_time
and search_end_time
columns, the column type would be timestamp.period_start
and period_end
.For the query you mentioned, I have made a slight modification to make the API call based on the query parameter, achieving the same result as the original query.
WITH period_end_1d AS (
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'MONTHLY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
period_end >= current_timestamp - interval '1d' AND
unblended_cost_amount <> 0
),
period_end_2d AS (
SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM aws_cost_usage
WHERE granularity = 'MONTHLY' AND
dimension_type_1 = 'SERVICE' AND
dimension_type_2 = 'USAGE_TYPE' AND
period_end >= current_timestamp - interval '2d' AND
unblended_cost_amount <> 0
)
SELECT * FROM period_end_1d
UNION
SELECT * FROM period_end_2d;
Thank you for your attention to these details. I look forward to your feedback.
Hi @ParthaI ,
I don't really have anything more to add to what I've already said. From your point of view breaking compatibility is a dealbreaker, and the current solution works most of the time, and it requires one less parameter in the query.
In my opinion, having the possibility of silently truncating data or silently causing unforeseen expenses is a huge dealbreaker, and is much worse than breaking compatibility and having some redundancy in the query. There is no easy way to know if a query will trigger the correct behaviour or not without tracing the API calls.
This means adding the search_start_time
and search_end_time
columns as required fields. These are required fields in the API, and thus they need to be required fields in steampipe as well, as they cannot always be inferred correctly.
@misraved , maybe give your feedback on what approach you prefer, and we'll go with that.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
Any update on this?
Sorry for the delay on this issue @shaicoleman!!
I will review and release the fix by the end of this week 👍.
Thank you for your patience.
For the following query:
Generates the following filter request:
This requests data for a whole year, instead of just for the requested period, and thus causes many unnecessary API requests which each cost $0.01.
Steampipe v0.22.0 turbot/aws v0.132.0