turbot / steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
https://steampipe.io
GNU Affero General Public License v3.0
6.67k stars 262 forks source link

Stemapipe service restart stuck after first attempt #4314

Open jeffreymp17 opened 2 weeks ago

jeffreymp17 commented 2 weeks ago

Describe the bug When running Steampipe as a service, when I try to restart the service I'm running in a Linux Docker container, this gets stuck on the first try and I need to kill the process and run it again and when it wakes up it sometimes doesn't recognize certain profiles and says they don't exist I attached a video below

Steampipe version (steampipe -v) Example: v0.23.2

To reproduce

Expected behavior Should restart after command is executed

Additional context Add any other context about the problem here.

https://github.com/turbot/steampipe/assets/30936236/9dab5a02-a791-44b9-8510-1c6490101840

idesofoctober commented 1 week ago

I have a different but similar issue, using:

powerpipe v0.4.0 aws plugin v0.139.0 steampipe v0.23.2

I create a config file and then run steampipe service start and then run powerpipe benchmark (aws compliance mod:latest), all in a docker container

Sometimes it works, sometimes I get a slew of errors like:

rpc error: code = Unavailable desc = error reading from server: read unix @->/tmp/plugin2367597389: read: connection reset by peer (SQLSTATE HV000

and

"scanIteratorBase cannot iterate: connection..."

pskrbasu commented 1 week ago

@jeffreymp17 I will try to reproduce this.

@idesofoctober is this something that you are seeing intermittently?

jeffreymp17 commented 1 week ago

This is happening every time

idesofoctober commented 6 days ago

@pskrbasu I'm looking at some reporting I created around this and across 419 accounts, it definitely DOES NOT happen with all accounts, but it happens consistently on the account it does occur with. Some additional context:

We run this on EKS and each account benchmark run is run in one EKS instance with it's own spc and credentials file (all of them created using the same underlying code). Happy to work with you re: what is the same/different about each account. I saw the issue posted here and thought it might be similar re: the overall database / connection load time/configuration, and my dockerfile does a steampipe service start, aws plugin install, and steampipe service stop. My code is doing a steampipe service start, wait 2 minutes, and then run benchmarks which (I think, my data and troubleshooting hasn't been super diligent) reduced the total number of accounts with this error.

One hypothesis I have (based on nothing tangible, but thought I'd share in case it raises an idea) is that the aws.spc for all of these is using * for region. Could the number or location of regions that are in play causing some difference in the load time (if the load time is in fact the issue?)?

I'm going on vacation soon, for next week, but if there is something I can look at re: differences between the accounts where this occurs and where this doesn't occur, i'll get back to you as soon as I can.