turbot / steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
https://steampipe.io
GNU Affero General Public License v3.0
6.67k stars 262 forks source link

Unusual query timing results with aggregator connection with caching enabled when some connections have errors #4288

Open cbruno10 opened 1 month ago

cbruno10 commented 1 month ago

Describe the bug When I query an aggregator with caching enabled, and I encounter errors, e.g.,

aws_pipeling_012: table 'aws_s3_bucket' column 'region' requires hydrate data from getBucketLocation, which failed with error operation error S3: GetBucketLocation, exceeded maximum number of attempts, 9, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get "https://stackset-test-bucket.s3.us-east-1.amazonaws.com/?location=": lookup stackset-test-bucket.s3.us-east-1.amazonaws.com on 192.168.1.1:53: read udp 192.168.1.22:62230->192.168.1.1:53: i/o timeout.

for some of the connections, the output shows the timing decreasing on subsequent queries but it doesn't report cached rows until the 2nd or further queries.

For instance, when running select name, region from aws_s3_bucket for an aggregator connection with 2 connections:

With 15 connections:

In both cases, the timing starts to decrease on the 2nd run, and then even more on subsequent runs until all rows are cached.

In the 2nd run, are any cached rows/results being used (or maybe just some of the hydrate calls)? As the query time decreases, I was assuming that was from it using some cached info, but it was difficult to tell from the timing/rows info.

Steampipe version (steampipe -v) Steampipe v0.23.2

To reproduce Run the query above with at least 2 connections (and 20-30 buckets in each account)

Expected behavior I'm not sure, maybe improvements to the rows fetched/hydrate calls info?

Additional context Add any other context about the problem here.