turbot / steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
https://steampipe.io
GNU Affero General Public License v3.0
6.79k stars 263 forks source link

Multiple steampipe cli executions share environment information #4155

Closed jlm0x017 closed 3 weeks ago

jlm0x017 commented 5 months ago

Describe the bug If I start a steampipe query session in one terminal window, it's environment is used by steampipe query executions in other terminals.

Steampipe version (steampipe -v) Steampipe v0.21.8

To reproduce open two terminal windows. from the first, execute AWS_PROFILE=env1 steampipe query; leave this open from the second terminal window execute AWS_PROFILE=env2 steampipe query "select count(*) from aws_ec2_instance;". The instance count will be from env1.

kill the client in window 1, and repeating the query in window 2 will now work.

Expected behavior I would expect simultaneous steampipe runs to use their respective environment variables

Additional context The second window will successfully execute steampipe query "select count(*) from env1.aws_ec2_instance;", providing the count for env1. However, this is a mask to the problem and an inconvenient workaround. If I'm writing multiple joins, I'd like not to have to specify the environment on each table name; I'd instead like to rely on the local value of AWS_PROFILE to be honored.

e-gineer commented 5 months ago

This is not ideal, but is a side effect of a design decision we made for ease of use.

Internally there are two parts to the Steampipe CLI - a client and a server.

When you run steampipe service start you are starting the server alone, making it available for multiple steampipe clients and postgres tools to connect to. The configuration of permissions and connections is done at a server level.

When you run steampipe query we:

  1. Try to connect to an existing steampipe service that is running.
  2. If there is no service, start a "temporary service".
  3. Connect to the running service.
  4. Run our query.
  5. Disconnect our client.
  6. Stop the "temporary service" if we were the last client.

This approach makes steampipe query easy to use and allows you to run multiple clients in parallel. The negative effect is that the "temporary service" configuration is based on the first client to start it. This is what you are seeing.

If you are using multiple AWS profiles I highly recommend you set them up as different connections in steampipe - then you can reuse the same service and query both of them by just changing the search path - https://steampipe.io/docs/guides/search-path

But, as a workaround if you really want environment variable based control, you can use the --install-dir and --port arguments to start a second steampipe service in parallel. You should definitely checkout workspaces if you want to do this sort of thing - they make the configuration a lot easier and more flexible.

jlm0x017 commented 5 months ago

This is perfectly understandable once you know how the product works. But for the user who doesn't know, or is concentrating on the work (rather than the tool) this behaviour is surprising and frustrating. As part of the core design, I recognize this will be hard to change, but if it can not be changed easily, then consider ways to make the behaviour more visibile.

Potential ideas:

This last one especially resonates with my experience, as I troubleshot the inconsistent results for 2+ hours. The forgotten console was three hours old, and a 5m timeout on the backend server would have shut down the conflicting server before the behaviour even appeared.

shaicoleman commented 4 months ago

I think steampipe should listen to a random local socket on each execution by default for standalone queries.

This solves a few issues:

e-gineer commented 3 months ago

Unfortunately @shaicoleman, running two instances of steampipe at the same time (e.g. on different ports) means running two postgres instances simultaneously. This is only possible with two separate installation directories to store all of the postgres data files, configuration, etc. Doing this on-demand will take time to setup and create a lot of noise / storage on the machine.

Providing more feedback through the UI about reuse as @jlm0x017 suggests seems like the best option at this point I think?

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] commented 3 weeks ago

This issue was closed because it has been stalled for 90 days with no activity.