turbot / steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
https://steampipe.io
GNU Affero General Public License v3.0
6.86k stars 267 forks source link

fix: do not abort when unrelated connections are not yet loaded #4233

Closed pdecat closed 3 months ago

pdecat commented 5 months ago

This PR resolves issues when connection configurations are created dynamically concurrently to executing queries.

Symptoms are that an unrelated connection that hasn't completed loading breaks other connections:

2024-04-05 12:05:19.955 UTC [INFO]  hub: goFdwBeginForeignScan, connection 'aws_000000000000', table 'aws_ec2_transit_gateway', explain: false
2024-04-05 12:05:19.957 UTC [WARN]  hub: pluginHub.GetIterator FAILED: failed to attach to plugin process for 'aws@0.133.0': no connection config loaded for 'aws_222222222222'
2024-04-05 12:05:19.957 UTC [299] ERROR:  failed to attach to plugin process for 'aws@0.133.0': no connection config loaded for 'aws_222222222222'
2024-04-05 12:05:19.957 UTC [299] STATEMENT:  SELECT 'APIGATEWAY' AS subtype, row_to_json(r) AS data FROM (SELECT
[...SNIP...]
2024-04-05 12:05:19.958 UTC [INFO]  hub: goFdwAbortCallback
2024-04-05 12:05:19.958 UTC [INFO]  hub: Hub Abort
2024-04-05 12:05:20.023 UTC [INFO]  hub: goFdwIterateForeignScan returned empty row - this scan complete (0xc00226f080)
2024-04-05 12:05:20.025 UTC [INFO]  hub: goFdwIterateForeignScan calling pluginHub.StartScan, table 'aws_account' (0xc00226f680)
2024-04-05 12:05:20.025 UTC [INFO]  hub: StartScan for table: aws_account, cache enabled: false, iterator 0xc00226f680, 0 quals (1712318705836)
2024-04-05 12:05:20.026 UTC [INFO]  hub: goFdwIterateForeignScan returned empty row - this scan complete (0xc00226f680)
2024-04-05 12:05:20.028 UTC [INFO]  hub: goFdwReScanForeignScan, connection '', table 'aws_ec2_transit_gateway'
2024-04-05 12:05:20.028 UTC [INFO]  hub: goFdwBeginForeignScan, connection 'aws_111111111111', table 'aws_ec2_transit_gateway', explain: false
2024-04-05 12:05:20.030 UTC [WARN]  hub: pluginHub.GetIterator FAILED: failed to attach to plugin process for 'aws@0.133.0': no connection config loaded for 'aws_222222222222'
2024-04-05 12:05:20.030 UTC [INFO]  hub: .
******************************************************

                steampipe postgres fdw shutdown

******************************************************

This problem seems to happen more frequently when connections are slower to load, most notably when limiters are configured.

See https://turbot-community.slack.com/archives/C01UECB59A7/p1712324097931309

For this change to be effective, the FDW has to be rebuilt with:

go mod edit -replace="github.com/turbot/steampipe=github.com/pdecat/steampipe @0a5b8a0928292f08ad842a63211188306e9998de"
pdecat commented 5 months ago

Submitted a reproduction case: https://github.com/pdecat/steampipe-4233-repro

kaidaguerre commented 3 months ago

hi @pdecat sorry for the long delay on this one. I actually spent a bunch of time digging into this one to try and identify the underlying reasons, but I failed/ran out of time and had to move on to other stuff.

Came back to this today and I think it's a fine tactical fix for now. I'd still like to get to the bottom of it but I'll merge in the meantime.

pdecat commented 3 months ago

Hi @kaidaguerre, no problem for the delay, been using my own fork until now :)

Thanks! :tada:

pdecat commented 2 months ago

The issue described in the PR description reproduces with HEAD of v0.23.x branch with FDW 1.11.3:

2024-07-12 15:04:34.927 UTC [INFO]  hub: goFdwReScanForeignScan, connection '', table 'aws_account'
2024-07-12 15:04:34.927 UTC [INFO]  hub: goFdwBeginForeignScan, connection 'aws_000000000000', table 'aws_account', explain: false
2024-07-12 15:04:34.927 UTC [INFO]  hub: goFdwBeginForeignScan, canPushdownAllSortFields true
2024-07-12 15:04:34.927 UTC [INFO]  hub: goFdwBeginForeignScan, table 'aws_account', sortOrder: []
2024-07-12 15:04:34.927 UTC [INFO]  hub: RemoteHub startScanForConnection 'aws_000000000000' limit -1
2024-07-12 15:04:34.947 UTC [WARN]  hub: pluginHub.GetIterator FAILED: failed to attach to plugin process for 'aws': no connection config loaded for 'aws_111111111111'
2024-07-12 15:04:34.947 UTC [INFO]  hub: buildConnectionLimitMap, table: 'aws_account', 1 connection, limit: -1
2024-07-12 15:04:34.947 UTC [INFO]  hub: connection 'aws_222222222222', table 'aws_account', quals
----------------------------------------------------------------
Column: account_id, Operator: '=', Value: '222222222222'
----------------------------------------------------------------
2024-07-12 15:04:34.947 UTC [300] ERROR:  failed to attach to plugin process for 'aws': no connection config loaded for 'aws_111111111111'
2024-07-12 15:04:34.947 UTC [300] STATEMENT:  SELECT 'APIGATEWAY' AS subtype, row_to_json(r) AS data FROM (SELECT
[...SNIP...]
2024-07-12 15:04:34.948 UTC [INFO]  hub: goFdwIterateForeignScan calling pluginHub.StartScan, table 'aws_account' Current timestamp: 774111873000000 (0xc000d0fc00)
2024-07-12 15:04:34.948 UTC [INFO]  hub: hubBase StartScan
2024-07-12 15:04:34.948 UTC [INFO]  hub: build executeConnectionData map: hub: 0xc0005da9b0, i.connectionLimitMap: map[aws_222222222222:-1]
2024-07-12 15:04:34.948 UTC [INFO]  hub: build executeConnectionData map returning table:"aws_account" query_context:{columns:"account_id" quals:{key:"account_id" value:{quals:{fieldName:"account_id" string_value:"=" value:{string_value:"222222222222"}}}}} connection:"aws_222222222222" call_id:"1720796674139" trace_context:{value:"{}"} executeConnectionData:{key:"aws_222222222222" value:{cache_enabled:true cache_ttl:300}}
2024-07-12 15:04:34.948 UTC [INFO]  hub: StartScan for table: aws_account, cache enabled: false, iterator 0xc000d0fc00, 1 quals (1720796674139)
2024-07-12 15:04:30.613 UTC [INFO]  hub: goFdwAbortCallback
2024-07-12 15:04:34.948 UTC [INFO]  hub: AddScanMetadata for iterator 0xc000a08700 query timestamp 774111852000000 (aws_000000000000)
2024-07-12 15:04:34.948 UTC [INFO]  hub: removeStaleScanMetadata for current query timestamp 774111852000000
[...SNIP...]
2024-07-12 15:04:38.418 UTC [INFO]  hub: .
******************************************************

                steampipe postgres fdw shutdown

******************************************************
pdecat commented 2 months ago

Using strings on the official v1.11.3 build of FDW, I can verify that my changes are not included as this first command should have returned something:

# strings .steampipe/db/14.2.0/postgres/lib/postgresql/steampipe_postgres_fdw.so | grep -F "[WARN] no connection config loaded for '"

If I remove the [WARN] prefix, I see it is the old version:

# strings .steampipe/db/14.2.0/postgres/lib/postgresql/steampipe_postgres_fdw.so | grep -F "no connection config loaded for '"
accessed data from freed user arena runtime: wrong goroutine in newstackruntime: invalid pc-encoded table f=trace: non-empty global trace buffer[TRACE] plugin client created for %sno connection config loaded for '%s'positional arg index %d out of range444089209850062616169452667236328125ryuFtoaFixed64 called with prec > 180123456789abcdefghijklmnopqrstuvwxyzmethod ABI and value ABI don't alignreflect.Value.Equal: values of type Channel switches to new LB policy %qSubchannel Connectivity change to %vnil receiver passed to UnmarshalJSONgoogle/protobuf/source_context.protocannot parse '%s', %d overflows uintcannot parse '%s', %f overflows uintcompressed name in SRV resource datareflect: NumField of non-struct typehttp: no Location header in responsehttp: unexpected EOF reading trailerhttp: invalid byte %q in Cookie.Path LastStreamID=%v ErrCode=%v Debug=%qRoundTrip retrying after failure: %vno acceptable authentication methodsinvalid semicolon separator in queryA static map expression is required.can't setValue for a nil EvalContextInvalid template interpolation valueTemplateJoinExpr got non-tuple tupleMissing close bracket on splat indexcty.Capsule(%q, reflect.TypeOf(%#v))%d elements are required, but got %dunsupported value type %#v in Equalselement key for tuple must be numberwrong number of elements %d; need %d6ba7b810-9dad-11d1-80b4-00c04fd430c86ba7b811-9dad-11d1-80b4-00c04fd430c86ba7b812-9dad-11d1-80b4-00c04fd430c86ba7b814-9dad-11d1-80b4-00c04fd430c8%v: MessageSet with no unknown fieldinvalid gRPC request content-type %qpeer header list size exceeded limittransport: malformed grpc-status: %vtransport: malformed http-status: %vfailed to write the HTTP request: %vcannot merge into invalid %v messageduration (%v) has out-of-range nanosinvalid pattern syntax (+ after -): expected an ECDSA public key, got %TTLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHATLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHAtls: keys must have at least one keyunsupported SSLv2 handshake receivedtls: server did not send a key sharex509: zero or negative DSA parameterx509: invalid CRL distribution pointx509: invalid subject key identifierx509: malformed algorithm identifierx509: invalid X25519 private key: %vx509: serial number must be positive  Owner: %d [%s] (current: %d [%s])