Open ejscheepers opened 1 month ago
Not sure if it might be related, but here are logs from Postgres DB:
2024-10-16T14:40:47.565962634Z 2024-10-16 14:40:47.565 UTC [884] FATAL: role "postgres" does not exist 2024-10-16T14:40:52.620820389Z 2024-10-16 14:40:52.618 UTC [891] FATAL: role "postgres" does not exist 2024-10-16T14:40:57.660249044Z 2024-10-16 14:40:57.660 UTC [898] FATAL: role "postgres" does not exist 2024-10-16T14:41:02.701285029Z 2024-10-16 14:41:02.701 UTC [906] FATAL: role "postgres" does not exist 2024-10-16T14:41:07.741504375Z 2024-10-16 14:41:07.741 UTC [913] FATAL: role "postgres" does not exist 2024-10-16T14:41:12.775925703Z 2024-10-16 14:41:12.775 UTC [920] FATAL: role "postgres" does not exist 2024-10-16T14:41:17.819197070Z 2024-10-16 14:41:17.817 UTC [928] FATAL: role "postgres" does not exist 2024-10-16T14:41:22.866831741Z 2024-10-16 14:41:22.866 UTC [935] FATAL: role "postgres" does not exist 2024-10-16T14:41:27.908494833Z 2024-10-16 14:41:27.908 UTC [942] FATAL: role "postgres" does not exist 2024-10-16T14:41:32.946391915Z 2024-10-16 14:41:32.946 UTC [949] FATAL: role "postgres" does not exist 2024-10-16T14:41:37.981018911Z 2024-10-16 14:41:37.980 UTC [956] FATAL: role "postgres" does not exist 2024-10-16T14:41:43.017404840Z 2024-10-16 14:41:43.017 UTC [963] FATAL: role "postgres" does not exist
Just a bit more context:
version: '3'
services:
plunk:
image: driaug/plunk
depends_on:
postgresql:
condition: service_healthy
redis:
condition: service_started
environment:
- SERVICE_FQDN_PLUNK_3000
- 'REDIS_URL=redis://redis:6379'
- 'DATABASE_URL=postgresql://${SERVICE_USER_POSTGRES}:${SERVICE_PASSWORD_POSTGRES}@postgresql/plunk?schema=public'
- 'JWT_SECRET=${SERVICE_PASSWORD_JWT_SECRET}'
- 'AWS_REGION=${AWS_REGION}'
- 'AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}'
- 'AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}'
- 'AWS_SES_CONFIGURATION_SET=${AWS_SES_CONFIGURATION_SET}'
- 'NEXT_PUBLIC_API_URI=${SERVICE_FQDN_PLUNK}/api'
- 'APP_URI=${SERVICE_FQDN_PLUNK}'
- 'API_URI=${SERVICE_FQDN_PLUNK}/api'
- DISABLE_SIGNUPS=False
entrypoint:
- /app/entry.sh
healthcheck:
test:
- CMD
- wget
- '-q'
- '--spider'
- 'http://127.0.0.1:3000'
interval: 2s
timeout: 10s
retries: 15
postgresql:
image: 'postgres:16-alpine'
environment:
- POSTGRES_USER=$SERVICE_USER_POSTGRES
- POSTGRES_PASSWORD=$SERVICE_PASSWORD_POSTGRES
- 'POSTGRES_DB=${POSTGRES_DB:-plunk}'
volumes:
- 'postgresql-data:/var/lib/postgresql/data'
healthcheck:
test:
- CMD-SHELL
- 'pg_isready -U postgres -d postgres'
interval: 5s
timeout: 10s
retries: 20
redis:
image: 'redis:7.4-alpine'
volumes:
- 'redis-data:/data'
healthcheck:
test:
- CMD
- redis-cli
- PING
interval: 5s
timeout: 10s
retries: 20
Not sure if possible @driaug , but adding an api health route would be very useful in the mean time? If the container crashes, we could use it to restart again.
At the moment I am using:
(wget -S --spider http://127.0.0.1:3000/api/users/@me 2>&1 | grep -q 'HTTP/1.1 [1-4]')
Before I was only checking http://127.0.0.1:3000, but this would give a false positive as only the dashboard would be running.
I second adding an healthcheck route. I'm using caprover to deploy - here's my captain-definition/one-click-app file for ref.
I think the issue stems from ipv6. I added the env var NODE_OPTIONS=--dns-result-order=ipv4first
. Currently testing this, no crashes yet.
edit: I can verify that the node options above fixed this issue, please test. @ejscheepers @driaug
edit2: I have switched to --no-network-family-autoselection
node option, dns result order didn't work. This is probably an issue with nodejs happy eyeballs implementation.
@ardasevinc does the new healthcheck route work for you? should be available in the latest version
Every now and again the API fails and does not restart.
Server Logs:
ode:internal/deps/undici/undici:13185 2024-10-16T04:29:01.190454265Z Error.captureStackTrace(err); 2024-10-16T04:29:01.190459825Z ^ 2024-10-16T04:29:01.190463705Z 2024-10-16T04:29:01.190467185Z TypeError: fetch failed 2024-10-16T04:29:01.190470865Z at node:internal/deps/undici/undici:13185:13 2024-10-16T04:29:01.190474745Z at process.processTicksAndRejections (node:internal/process/task_queues:105:5) { 2024-10-16T04:29:01.190478825Z [cause]: AggregateError [ETIMEDOUT]: 2024-10-16T04:29:01.190482625Z at internalConnectMultiple (node:net:1122:18) 2024-10-16T04:29:01.190486185Z at internalConnectMultiple (node:net:1190:5) 2024-10-16T04:29:01.190489785Z at Timeout.internalConnectMultipleTimeout (node:net:1716:5) 2024-10-16T04:29:01.190493465Z at listOnTimeout (node:internal/timers:596:11) 2024-10-16T04:29:01.190498985Z at process.processTimers (node:internal/timers:529:7) { 2024-10-16T04:29:01.190502665Z code: 'ETIMEDOUT', 2024-10-16T04:29:01.190506065Z [errors]: [ 2024-10-16T04:29:01.190509545Z Error: connect ETIMEDOUT 188.114.97.3:443 2024-10-16T04:29:01.190513105Z at createConnectionError (node:net:1652:14) 2024-10-16T04:29:01.190516705Z at Timeout.internalConnectMultipleTimeout (node:net:1711:38) 2024-10-16T04:29:01.190520425Z at listOnTimeout (node:internal/timers:596:11) 2024-10-16T04:29:01.190524025Z at process.processTimers (node:internal/timers:529:7) { 2024-10-16T04:29:01.190527745Z errno: -110, 2024-10-16T04:29:01.190531145Z code: 'ETIMEDOUT', 2024-10-16T04:29:01.190534585Z syscall: 'connect', 2024-10-16T04:29:01.190538065Z address: '188.114.97.3', 2024-10-16T04:29:01.190542545Z port: 443 2024-10-16T04:29:01.190545865Z }, 2024-10-16T04:29:01.190549545Z Error: connect ENETUNREACH 2a06:98c1:3121::3:443 - Local (:::0) 2024-10-16T04:29:01.190553745Z at internalConnectMultiple (node:net:1186:16) 2024-10-16T04:29:01.190558345Z at Timeout.internalConnectMultipleTimeout (node:net:1716:5) 2024-10-16T04:29:01.190580945Z at listOnTimeout (node:internal/timers:596:11) 2024-10-16T04:29:01.190585225Z at process.processTimers (node:internal/timers:529:7) { 2024-10-16T04:29:01.190589025Z errno: -101, 2024-10-16T04:29:01.190594705Z code: 'ENETUNREACH', 2024-10-16T04:29:01.190598105Z syscall: 'connect', 2024-10-16T04:29:01.190601545Z address: '2a06:98c1:3121::3', 2024-10-16T04:29:01.190605065Z port: 443 2024-10-16T04:29:01.190608985Z }, 2024-10-16T04:29:01.190612345Z Error: connect ETIMEDOUT 188.114.96.3:443 2024-10-16T04:29:01.190616065Z at createConnectionError (node:net:1652:14) 2024-10-16T04:29:01.190619665Z at Timeout.internalConnectMultipleTimeout (node:net:1711:38) 2024-10-16T04:29:01.190623585Z at listOnTimeout (node:internal/timers:596:11) 2024-10-16T04:29:01.190627105Z at process.processTimers (node:internal/timers:529:7) { 2024-10-16T04:29:01.190630745Z errno: -110, 2024-10-16T04:29:01.190634105Z code: 'ETIMEDOUT', 2024-10-16T04:29:01.190637505Z syscall: 'connect', 2024-10-16T04:29:01.190640905Z address: '188.114.96.3', 2024-10-16T04:29:01.190644305Z port: 443 2024-10-16T04:29:01.190647745Z }, 2024-10-16T04:29:01.190651065Z Error: connect ENETUNREACH 2a06:98c1:3120::3:443 - Local (:::0) 2024-10-16T04:29:01.190655825Z at internalConnectMultiple (node:net:1186:16) 2024-10-16T04:29:01.190659665Z at Timeout.internalConnectMultipleTimeout (node:net:1716:5) 2024-10-16T04:29:01.190663545Z at listOnTimeout (node:internal/timers:596:11) 2024-10-16T04:29:01.190667145Z at process.processTimers (node:internal/timers:529:7) { 2024-10-16T04:29:01.190670745Z errno: -101, 2024-10-16T04:29:01.190674145Z code: 'ENETUNREACH', 2024-10-16T04:29:01.190677785Z syscall: 'connect', 2024-10-16T04:29:01.190681225Z address: '2a06:98c1:3120::3', 2024-10-16T04:29:01.190684665Z port: 443 2024-10-16T04:29:01.190687986Z } 2024-10-16T04:29:01.190691346Z ] 2024-10-16T04:29:01.190695146Z } 2024-10-16T04:29:01.190698506Z } 2024-10-16T04:29:01.190701826Z 2024-10-16T04:29:01.190705146Z Node.js v22.9.0
If I restart container, it starts working again.