metrico / qryn

⭐️ All-in-One Polyglot Observability with OLAP Storage for Logs, Metrics, Traces & Profiles. Drop-in Grafana Cloud replacement compatible with Loki, Prometheus, Tempo, Pyroscope, Opentelemetry, Datadog and beyond :rocket:
https://qryn.dev
GNU Affero General Public License v3.0
1.24k stars 68 forks source link

QRYN container will not start on Podman #339

Closed Ceyword closed 1 year ago

Ceyword commented 1 year ago

I followed the example on https://qryn.metrico.in/#/installation, Docker tab to install Qryn on Podman with success but was unable to start a Qryn container.

CASE 1: If I just start a container without setting any environmental variable I get:

"name":"qryn","err":"connect ECONNREFUSED 127.0.0.1:8123\nError: connect ECONNREFUSED 127.0.0.1:8123\n at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1278:16)","msg":"Error starting qryn"

CASE 2: If I set only the ENV CLICKHOUSE_AUTH admin:valid-password-of-valid-user I get:

"name":"qryn","err":"getaddrinfo ENOTFOUND admin\nError: getaddrinfo ENOTFOUND admin\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26)","msg":"Error starting qryn"

CASE 3: If I set any combination of CLICKHOUSE_SERVER = localhost | 127.0.0.1 | 192.200.0.123 CLICKHOUSE_AUTH = admin:valid-password-of-valid-user CLICKHOUSE_PORT = 8123 CLICKHOUSE_DB = qryn CLICKHOUSE_PROTO = http PORT = 3100, still I get:

"name":"qryn","err":"getaddrinfo ENOTFOUND admin\nError: getaddrinfo ENOTFOUND admin\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26)","msg":"Error starting qryn"

If seems qryn.js just cannot find the Clickhouse server whatever your ENV values.

NOTE: Clickhouse is accessible on browser both locally and remotely returning 'OK' for localhost:8123, 127.0.0.1:8123 and 192.200.0.123:8123. Also Clickhouse is accessible to remote Grafana instance.

OS: AlmaLinux 9 ClickHouse Server Version: 23.8.1 Podman Version: 4.4.1

lmangani commented 1 year ago

Hello @Ceyword I could not replicate this issue and our CI tests show no such problem. ENOTFOUND errors are about the container name resolution in docker rather than about the application being configured. What is the actual configuration attempted for both services?

CLICKHOUSE_SERVER = localhost | 127.0.0.1 | 192.200.0.123

This is very unlikely to be the correct config. the clickhouse container name should be used.

Ceyword commented 1 year ago

Hello @lmangani, Thank you for the effort. I have tried to re-present the steps with more details.

SETUP: OS: AlmaLinux 9 SERVER: ClickHouse Server Version: 23.8.1 (standalone) CONTAINER MANAGER: Podman Version: 4.4.1

NOTES: 1- ClickHouse was installed directly on machine, not using container 2- qryn was installed on Podman.

CASE A:

Step 1: Pull latest qryn > podman pull qxip/qryn:latest QrynImage

Step 2: Start container without setting any ENV variable: QrynStartContainer_NOENV

Result: Understandably, connection was refused QrynContainerError_NOENV

CASE B:

Step 1: same as above

Step 2: Start container by setting only one ENV variable (CLICKHOUSE_AUTH) so that default values will be used for the rest: QrynStartContainer_AUTHENV

Result: Container started and exited. QrynContainerError_AUTHENV

CASE C:

Step 1: same as above

Step 2: Start container by setting the following ENV variables: CLICKHOUSE_SERVER = localhost CLICKHOUSE_AUTH = admin:valid-password-of-valid-user CLICKHOUSE_PORT = 8123 CLICKHOUSE_DB = qryn CLICKHOUSE_PROTO = http PORT = 3100 QrynStartContainer_ALLENV

Result: Container started and exited. QrynContainerError_ALLENV

NOTE: I repeated step 2(CASE C) with CLICKHOUSE_SERVER = 127.0.0.1, then CLICKHOUSE_SERVER = 192.200.0.123; both same result.

NOTE: I cannot yet explain this but the details of the errors have changed. Shouldn't this be impossible with a containerized image?

lmangani commented 1 year ago

This does not appear to be a qryn issue. When running containers, neither localhost or 127.0.0.1 make sense as options unless you are running in host mode. There's nothing special at play here. If you can curl the clickhouse HTTP API then qryn should work too. There's no need to set any other parameter unless you have connectivity figured out.

Could you show a curl container API call to your ClickHouse service to demonstrate that works?

akvlad commented 1 year ago

Hello @Ceyword. Please try using Use the host networking stack as the network mode in your podman. Знімок екрана з 2023-08-06 12-33-36

akvlad commented 1 year ago

@Ceyword another possible solution is to set CLICKHOUSE_SERVER env var to host.containers.internal as it described here: https://stackoverflow.com/questions/58678983/accessing-host-from-inside-container

Ceyword commented 1 year ago

Hello @lmangani and @akvlad,

If you can curl the clickhouse HTTP API then qryn should work too.

Yes, curl from within Podman container terminal response was OK. Similarly Grafana, curl, direct browser connection, all work from a remote machine. I also tried all the provided suggestions with no success.

Now, to answer the question "Is this a container-host connectivity issue?", I re-installed Qryn on same machine directly, following the steps provided at https://qryn.metrico.in/#/installation PM2 tab. I tried to connect with ClickHouse on same machine directly. Qryn will not start either:

[ceyword@localhost ~]$ sudo su [sudo] password for ceyword: [root@localhost ceyword]# sudo /usr/local/bin/pm2 list ┌────┬─────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐ │ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │ ├────┼─────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤ │ 0 │ qryn │ default │ N/A │ fork │ 21163 │ 5s │ 698 │ online │ 0% │ 145.4mb │ root │ disabled │ └────┴─────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘ [root@localhost ceyword]# sudo /usr/local/bin/pm2 logs 0 [TAILING] Tailing last 15 lines for [0] process (change the value with --lines option) /root/.pm2/logs/qryn-error.log last 15 lines: /root/.pm2/logs/qryn-out.log last 15 lines: 0|qryn | {"level":30,"time":1691377739190,"pid":21201,"hostname":"localhost.localdomain","name":"qryn","msg":"Initializing DB... qryn"} 0|qryn | {"level":30,"time":1691377739239,"pid":21201,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":30,"time":1691377739626,"pid":21201,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":50,"time":1691377749263,"pid":21201,"hostname":"localhost.localdomain","name":"qryn","err":"getaddrinfo ENOTFOUND default\nError: getaddrinfo ENOTFOUND default\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)","msg":"Error starting qryn"} 0|qryn | {"level":30,"time":1691377750447,"pid":21227,"hostname":"localhost.localdomain","name":"qryn","msg":"Initializing DB... qryn"} 0|qryn | {"level":30,"time":1691377750482,"pid":21227,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":30,"time":1691377750880,"pid":21227,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":50,"time":1691377760528,"pid":21227,"hostname":"localhost.localdomain","name":"qryn","err":"getaddrinfo ENOTFOUND default\nError: getaddrinfo ENOTFOUND default\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)","msg":"Error starting qryn"} 0|qryn | {"level":30,"time":1691377761729,"pid":21255,"hostname":"localhost.localdomain","name":"qryn","msg":"Initializing DB... qryn"} 0|qryn | {"level":30,"time":1691377761778,"pid":21255,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":30,"time":1691377762195,"pid":21255,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":50,"time":1691377771818,"pid":21255,"hostname":"localhost.localdomain","name":"qryn","err":"getaddrinfo ENOTFOUND default\nError: getaddrinfo ENOTFOUND default\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)","msg":"Error starting qryn"} 0|qryn | {"level":30,"time":1691377773009,"pid":21281,"hostname":"localhost.localdomain","name":"qryn","msg":"Initializing DB... qryn"} 0|qryn | {"level":30,"time":1691377773047,"pid":21281,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":30,"time":1691377773406,"pid":21281,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":50,"time":1691377783078,"pid":21281,"hostname":"localhost.localdomain","name":"qryn","err":"getaddrinfo ENOTFOUND default\nError: getaddrinfo ENOTFOUND default\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)","msg":"Error starting qryn"} 0|qryn | {"level":30,"time":1691377784446,"pid":21322,"hostname":"localhost.localdomain","name":"qryn","msg":"Initializing DB... qryn"} 0|qryn | {"level":30,"time":1691377784487,"pid":21322,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":30,"time":1691377784877,"pid":21322,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":50,"time":1691377794535,"pid":21322,"hostname":"localhost.localdomain","name":"qryn","err":"getaddrinfo ENOTFOUND default\nError: getaddrinfo ENOTFOUND default\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)","msg":"Error starting qryn"} 0|qryn | {"level":30,"time":1691377795781,"pid":21348,"hostname":"localhost.localdomain","name":"qryn","msg":"Initializing DB... qryn"} 0|qryn | {"level":30,"time":1691377795819,"pid":21348,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":30,"time":1691377796212,"pid":21348,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":50,"time":1691377805869,"pid":21348,"hostname":"localhost.localdomain","name":"qryn","err":"getaddrinfo ENOTFOUND default\nError: getaddrinfo ENOTFOUND default\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)","msg":"Error starting qryn"} 0|qryn | {"level":30,"time":1691377807064,"pid":21374,"hostname":"localhost.localdomain","name":"qryn","msg":"Initializing DB... qryn"} 0|qryn | {"level":30,"time":1691377807099,"pid":21374,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":30,"time":1691377807470,"pid":21374,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":50,"time":1691377817138,"pid":21374,"hostname":"localhost.localdomain","name":"qryn","err":"getaddrinfo ENOTFOUND default\nError: getaddrinfo ENOTFOUND default\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)","msg":"Error starting qryn"} 0|qryn | {"level":30,"time":1691377818477,"pid":21398,"hostname":"localhost.localdomain","name":"qryn","msg":"Initializing DB... qryn"} 0|qryn | {"level":30,"time":1691377818537,"pid":21398,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} 0|qryn | {"level":30,"time":1691377818983,"pid":21398,"hostname":"localhost.localdomain","name":"qryn","msg":"xxh ready"} [1]+ Stopped sudo /usr/local/bin/pm2 logs 0 [root@localhost ceyword]#

Trying to connect to localhost or 127.0.0.1 or 192.200.0.123, nodejs is having issue with some 'default' as highlighted.

I also tried many "getaddrinfo ENOTFOUND" solutions on Stackoverflow, including ensuring that 127.0.0.1 localhost is set in /etc/hosts. No success yet.

lmangani commented 1 year ago

@Ceyword sorry but there's really nothing special about qryn and how it connects to clickhouse. There's no reason why curl would work and qryn wouldn't from a technical perspective. I guess you could build your own container or The project has thousands of setups and this would most definitely be a showstopper if it was a generic issue.

getaddrinfo ENOTFOUND default\

This still looks like bad configuration is the issue.

Ceyword commented 1 year ago

Hello All, ISSUE: The issue was the existence of the character '#' in the ClickHouse password. SOLUTION: Replacing '#' with '%23'(URL encoding) in the password, Qryn container in Podman connected to ClickHouse on the host. Also container networking option (in the Networking tab of Podman Desktop) was set to 'Use the host networking stack'. Thank you.

lmangani commented 1 year ago

Thanks @Ceyword for sharing your solution, I'm sure it might help others in the future! Perhaps we should return a warning if we detect invalid characters in the password field

Ceyword commented 1 year ago

Thank you too @lmangani . Yes it might save others some headache tomorrow. Yes, some form of a properly placed guidance will be helpful. As a thumbs up for the Qryn project, if you consider ease of deployment, operational manageability, TCO, scale, I personally cannot see a better way to collect telemetry data than a polyglot collector writing to ClickHouse, since ClickHouse is proven and massive.