turbot / steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
https://steampipe.io
GNU Affero General Public License v3.0
6.99k stars 276 forks source link

Vague error about pluginmanager failing to start on a fresh/new install #1850

Closed kquinsland closed 2 years ago

kquinsland commented 2 years ago

Describe the bug

A fresh install of steampipe on Ubuntu results in a vague error message on startup:

$ steampipe query
Welcome to Steampipe v0.13.5
For more information, type .help
2022-04-12 21:35:26.904 UTC [WARN]  steampipe: plugin manager start() failed to start GRPC client for plugin manager: Unrecognized remote plugin message: 

This usually means that the plugin is either invalid or simply
needs to be recompiled to support the latest protocol.
2022-04-12 21:35:26.904 UTC [WARN]  steampipe: StartServices plugin manager failed to start: Unrecognized remote plugin message: 

This usually means that the plugin is either invalid or simply
needs to be recompiled to support the latest protocol.

Error: Unrecognized remote plugin message: 

This usually means that the plugin is either invalid or simply
needs to be recompiled to support the latest protocol.

**Steampipe version 0.13.5

To reproduce

I was able to reproduce a few different ways:

But to keep things simple, this can be replicated with docker, too:

❯ sudo docker run -t --platform=linux/amd64 -i ubuntu:21.10 /bin/bash
[sudo] password for karl: 
root@cd44d4bb96ca:/# useradd test
root@cd44d4bb96ca:/# mkdir -p /home/test/.steampipe
root@cd44d4bb96ca:/# chown -R test /home/test/
root@cd44d4bb96ca:/# apt update; apt install -y curl
<...>
Running hooks in /etc/ca-certificates/update.d...
done.
root@cd44d4bb96ca:/# /bin/sh -c "$(curl -fsSL https://raw.githubusercontent.com/turbot/steampipe/main/install.sh)"
Created temporary directory at /tmp/tmp.nEGNGpk7dN. Changing to /tmp/tmp.nEGNGpk7dN
Downloading from https://github.com/turbot/steampipe/releases/latest/download/steampipe_linux_amd64.tar.gz
<...>
Steampipe was installed successfully to /usr/local/bin/steampipe
root@cd44d4bb96ca:/# su - test
$ which steampipe
/usr/local/bin/steampipe
$ steampipe query
Welcome to Steampipe v0.13.5
For more information, type .help
2022-04-12 21:55:19.929 UTC [WARN]  steampipe: plugin manager start() failed to start GRPC client for plugin manager: Unrecognized remote plugin message: 

This usually means that the plugin is either invalid or simply
needs to be recompiled to support the latest protocol.
2022-04-12 21:55:19.929 UTC [WARN]  steampipe: StartServices plugin manager failed to start: Unrecognized remote plugin message: 

This usually means that the plugin is either invalid or simply
needs to be recompiled to support the latest protocol.

Error: Unrecognized remote plugin message: 

This usually means that the plugin is either invalid or simply
needs to be recompiled to support the latest protocol.

Expected behavior

I would expect at least a little bit more information about the error... but ideally i'd expect that the tool works :).

Additional context Prior to opening this issue, there was a long slack thread doing some initial troubleshooting: https://steampipe.slack.com/archives/C01UECB59A7/p1649790510554629

From the same docker container I used to show the issue above:

(note, I did -I here to not show the DB folder as there are hundreds of small files. You can see a very similar result from the desktop that is HOSTING docker from the slack thread / here: https://gist.github.com/karl-sprig/0dcbead6f46cea6ffc81dca48b352cd5)

$ pwd  
/home/test/.steampipe
$ tree -I db 
.
|-- config
|   `-- default.spc
|-- internal
|   |-- history.json
|   |-- steampipe.json
|   `-- update-check.json
|-- logs
|   |-- database-2022-04-12.log
|   `-- plugin-2022-04-12.log
`-- plugins

4 directories, 6 files
$ cat logs/database-2022-04-12.log
2022-04-12 21:55:19.690 UTC [2731] LOG:  database system was shut down at 2022-04-12 21:55:19 UTC
2022-04-12 21:55:19.692 UTC [2729] LOG:  database system is ready to accept connections
2022-04-12 21:55:19.716 UTC [2737] LOG:  connection received: host=127.0.0.1 port=60670
2022-04-12 21:55:19.717 UTC [2737] LOG:  connection authorized: user=root database=postgres
2022-04-12 21:55:19.733 UTC [2737] LOG:  disconnection: session time: 0:00:00.018 user=root database=postgres host=127.0.0.1 port=60670
2022-04-12 21:55:19.787 UTC [2738] LOG:  connection received: host=127.0.0.1 port=60672
2022-04-12 21:55:19.798 UTC [2738] LOG:  connection authorized: user=root database=postgres SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256, compression=off)
2022-04-12 21:55:19.809 UTC [2738] LOG:  disconnection: session time: 0:00:00.022 user=root database=postgres host=127.0.0.1 port=60672
2022-04-12 21:55:19.862 UTC [2739] LOG:  connection received: host=127.0.0.1 port=60674
2022-04-12 21:55:19.870 UTC [2739] LOG:  connection authorized: user=root database=steampipe SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256, compression=off)
2022/04/12 21:55:19 [INFO] 
******************************************************

                steampipe postgres fdw init

******************************************************
2022/04/12 21:55:19 [INFO] Log level WARN
2022-04-12 21:55:19.917 UTC [2739] LOG:  disconnection: session time: 0:00:00.054 user=root database=steampipe host=127.0.0.1 port=60674
$ cat logs/plugin-2022-04-12.log
2022-04-12 21:55:19.928 UTC [WARN]  failed to create connection watcher: too many open files
kquinsland commented 2 years ago

I should point out that other people in the slack thread were not able to reproduce the issue using docker. This starts to point to something about the underlying host configuration but it's hard to draw too many hard conclusions when the underlying host is a mac.

I have a colleague that uses a similar setup to me: modern intel laptop + debian based distro. He does not use ZFS, though - I do. He is currently traveling so it might be a few days before he can test the steps above on his own workstation.

IN the mean time / for the turbot engineers to do internal testing on a VM or similar: the ubuntu desktop installer makes it pretty easy to use ZFS. The graphical installer will do everything for you: https://ubuntu.com/blog/zfs-focus-on-ubuntu-20-04-lts-whats-new

I run postgres on k8s using a PV mounted over iSCSI backed by a NAS using ZFS... so I know it is 100000% possible to use postgres on zfs.

kquinsland commented 2 years ago

My google-foo is not the strongest, but I did find one plausible hit:

https://bugzilla.redhat.com/show_bug.cgi?id=1829596

kaidaguerre commented 2 years ago

Thanks for reporting this in such details @kquinsland

I'll look into it

e-gineer commented 2 years ago

@kquinsland I wonder if increasing the limits at the system level (rather than user level) will help? Reference - https://discuss.linuxcontainers.org/t/weird-error-for-zfs-storage-backend/6008

@kaidaguerre File limits might be a red herring, definitely wondering why we would be anywhere near that limit for a blank install? Per the Slack thread, my current best guess is that the issue is related to using ZFS though...

kquinsland commented 2 years ago

my coworker had super limited connectivity / was not able to docker pull. He'll be able to test next week.


@kquinsland I wonder if increasing the limits at the system level (rather than user level) will help? Reference - https://discuss.linuxcontainers.org/t/weird-error-for-zfs-storage-backend/6008

I am skeptical, but I'll give it a shot. It would be nice to know what about the userland limits that are not applying to the process launched by the user ... in user land.

I need to restart this machine later today to apply some usermod properties so I'll bump up the sysctls before I do that. Hope to have an EDIT for this in a few hours time or less.

kquinsland commented 2 years ago

had 15 min of spare time:

root@workpad:/etc/sysctl.d# cp 99-sysctl.conf 99-steampipe-testing.conf
root@workpad:/etc/sysctl.d# nano 99-steampipe-testing.conf 
root@workpad:/etc/sysctl.d# cat 99-steampipe-testing.conf 
##
# See: https://github.com/turbot/steampipe/issues/1850
##

fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576

root@workpad:/etc/sysctl.d# service procps force-reload
root@workpad:/etc/sysctl.d# sysctl -a | grep fs.inotify
fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576
root@workpad:/etc/sysctl.d# logout
❯ sysctl -a | grep fs.inotify
sysctl: permission denied on key 'fs.protected_fifos'
sysctl: permission denied on key 'fs.protected_hardlinks'
sysctl: permission denied on key 'fs.protected_regular'
sysctl: permission denied on key 'fs.protected_symlinks'
fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576
sysctl: permission denied on key 'kernel.apparmor_display_secid_mode'
sysctl: permission denied on key 'kernel.cad_pid'
sysctl: permission denied on key 'kernel.unprivileged_userns_apparmor_policy'
sysctl: permission denied on key 'kernel.usermodehelper.bset'
sysctl: permission denied on key 'kernel.usermodehelper.inheritable'
sysctl: permission denied on key 'net.core.bpf_jit_harden'
sysctl: permission denied on key 'net.core.bpf_jit_kallsyms'
sysctl: permission denied on key 'net.core.bpf_jit_limit'
sysctl: permission denied on key 'net.ipv4.tcp_fastopen_key'
sysctl: permission denied on key 'net.ipv6.conf.all.stable_secret'
sysctl: permission denied on key 'net.ipv6.conf.default.stable_secret'
sysctl: permission denied on key 'net.ipv6.conf.docker0.stable_secret'
sysctl: permission denied on key 'net.ipv6.conf.enp85s0.stable_secret'
sysctl: permission denied on key 'net.ipv6.conf.lo.stable_secret'
sysctl: permission denied on key 'net.ipv6.conf.vethd18745b.stable_secret'
sysctl: permission denied on key 'net.ipv6.conf.wlp0s20f3.stable_secret'
sysctl: permission denied on key 'vm.mmap_rnd_bits'
sysctl: permission denied on key 'vm.mmap_rnd_compat_bits'
sysctl: permission denied on key 'vm.stat_refresh'

The suggested tweaks have been applied and appear to be active in userspace.

And now lets try running it w/ a fresh install:

❯ brew uninstall --force --zap steampipe; rm -rf /home/karl/.steampipe
Uninstalling steampipe... (3 files, 30.8MB)
❯ brew install steampipe
==> Auto-updated Homebrew!
Updated Homebrew from e1c1157a7 to cc6a91fea.
Updated 1 tap (homebrew/core).
==> Updated Formulae
Updated 62 formulae.

==> Downloading https://github.com/turbot/steampipe/releases/download/v0.13.5/steampipe_linux_amd64.tar.gz
Already downloaded: /home/karl/.cache/Homebrew/downloads/23e17a4a289f73f924f3e768676817900532869fb80bea51b0843e9728079921--steampipe_linux_amd64.tar.gz
==> Installing steampipe from turbot/tap
🍺  /home/linuxbrew/.linuxbrew/Cellar/steampipe/0.13.5: 3 files, 30.8MB, built in 1 second
==> Running `brew cleanup steampipe`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
❯ which steampipe
/home/linuxbrew/.linuxbrew/bin/steampipe
❯ ls -lah ~/.steampipe
ls: cannot access '/home/karl/.steampipe': No such file or directory
❯ steampipe query
Welcome to Steampipe v0.13.5
For more information, type .help
> .help

So that IS progress! However, there seems to be something wrong w/ DB init:

ksnip_20220413-110840

I took the screenshot after about 90s of waiting. I spent about 10 min editing console out attached to this post ... and its still hung.

Taking a look @ the logs:

❯ pwd
/home/karl/.steampipe/logs
❯ ls
database-2022-04-13.log  plugin-2022-04-13.log
❯ cat plugin-2022-04-13.log
❯ cat database-2022-04-13.log
2022-04-13 17:51:39.823 UTC [3926937] LOG:  database system was shut down at 2022-04-13 17:51:39 UTC
2022-04-13 17:51:39.824 UTC [3926935] LOG:  database system is ready to accept connections
2022-04-13 17:51:39.858 UTC [3926943] LOG:  connection received: host=127.0.0.1 port=60804
2022-04-13 17:51:39.860 UTC [3926943] LOG:  connection authorized: user=root database=postgres
2022-04-13 17:51:39.869 UTC [3926943] LOG:  disconnection: session time: 0:00:00.011 user=root database=postgres host=127.0.0.1 port=60804
2022-04-13 17:51:39.924 UTC [3926944] LOG:  connection received: host=127.0.0.1 port=60806
2022-04-13 17:51:39.933 UTC [3926944] LOG:  connection authorized: user=root database=postgres SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256, compression=off)
2022-04-13 17:51:39.945 UTC [3926944] LOG:  disconnection: session time: 0:00:00.020 user=root database=postgres host=127.0.0.1 port=60806
2022-04-13 17:51:39.999 UTC [3926945] LOG:  connection received: host=127.0.0.1 port=60808
2022-04-13 17:51:40.006 UTC [3926945] LOG:  connection authorized: user=root database=steampipe SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256, compression=off)
2022/04/13 10:51:40 [INFO] 
******************************************************

                steampipe postgres fdw init

******************************************************
2022/04/13 10:51:40 [INFO] Log level WARN
2022-04-13 17:51:40.039 UTC [3926945] LOG:  disconnection: session time: 0:00:00.040 user=root database=steampipe host=127.0.0.1 port=60808
2022-04-13 17:51:40.102 UTC [3926965] LOG:  connection received: host=127.0.0.1 port=60810
2022-04-13 17:51:40.105 UTC [3926965] LOG:  connection authorized: user=steampipe database=steampipe application_name=steampipe_930199cda636b837b9dbd473bcfdfbca SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256, compression=off)
2022-04-13 17:51:40.169 UTC [3926967] LOG:  connection received: host=127.0.0.1 port=60812
2022-04-13 17:51:40.172 UTC [3926967] LOG:  connection authorized: user=root database=steampipe SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256, compression=off)
2022-04-13 17:51:40.174 UTC [3926967] LOG:  disconnection: session time: 0:00:00.005 user=root database=steampipe host=127.0.0.1 port=60812
2022-04-13 17:51:40.229 UTC [3926968] LOG:  connection received: host=127.0.0.1 port=60814
2022-04-13 17:51:40.235 UTC [3926968] LOG:  connection authorized: user=root database=steampipe SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256, compression=off)
2022-04-13 17:51:40.235 UTC [3926968] LOG:  disconnection: session time: 0:00:00.006 user=root database=steampipe host=127.0.0.1 port=60814
2022-04-13 17:51:40.247 UTC [3926965] ERROR:  syntax error at or near ")" at character 527
2022-04-13 17:51:40.247 UTC [3926965] STATEMENT:  
        SELECT
            table_name,
            column_name,
            column_default,
            is_nullable,
            data_type,
            table_schema,
            (COALESCE(pg_catalog.col_description(c.oid, cols.ordinal_position :: int),'')) as column_comment,
            (COALESCE(pg_catalog.obj_description(c.oid),'')) as table_comment
        FROM
            information_schema.columns cols
        LEFT JOIN
            pg_catalog.pg_namespace nsp ON nsp.nspname = cols.table_schema
        LEFT JOIN
            pg_catalog.pg_class c ON c.relname = cols.table_name AND c.relnamespace = nsp.oid
        WHERE
                cols.table_schema in ()
                OR
            LEFT(cols.table_schema,8) = 'pg_temp_'

Before posting, I did a ctrl+c:

❯ steampipe query
Welcome to Steampipe v0.13.5
For more information, type .help
> .help
Error: execution cancelled

The database log does not show anything beyond the:

2022-04-13 18:13:16.634 UTC [3939504] ERROR:  syntax error at or near ")" at character 527
e-gineer commented 2 years ago

@kquinsland Sorry - that hang on immediately running a dot command (e.g. .help) is a different issue ... I actually hit it yesterday while testing this for you ... reported here and will be fixed in the next release - https://github.com/turbot/steampipe/issues/1859

Please try installing a plugin, which makes it go away. Hopefully you are there!

kquinsland commented 2 years ago

Oh hey / look @ that. Thanks for the tip @e-gineer .

I did a quick steampipe plugin install aws and was able to steampipe query:

> select organization_feature_set from aws_account;
+--------------------------+
| organization_feature_set |
+--------------------------+
| ALL                      |
+--------------------------+

So yeah. That was a fun little roller coaster that I am happily going to mark as resolved :D.