ponder-sh / ponder

A backend framework for crypto apps
https://ponder.sh
MIT License
642 stars 102 forks source link

"Healthcheck failed!" error when deploying on Railway #683

Closed Destiner closed 8 months ago

Destiner commented 8 months ago

Sometimes, when you deploy or redeploy Ponder on Railway, the healthcheck fails.

Here's an example:

Build Log ```sh Path: /health Retry window: 5m0s Attempt #1 failed with service unavailable. Continuing to retry for 4m59s Attempt #2 failed with service unavailable. Continuing to retry for 4m58s Attempt #3 failed with service unavailable. Continuing to retry for 4m56s Attempt #4 failed with service unavailable. Continuing to retry for 4m52s Attempt #5 failed with service unavailable. Continuing to retry for 4m44s Attempt #6 failed with status 503. Continuing to retry for 4m27s Attempt #7 failed with status 503. Continuing to retry for 3m57s Attempt #8 failed with status 503. Continuing to retry for 3m27s Attempt #9 failed with status 503. Continuing to retry for 2m56s Attempt #10 failed with status 503. Continuing to retry for 2m25s Attempt #11 failed with status 503. Continuing to retry for 1m54s Attempt #12 failed with status 503. Continuing to retry for 1m24s Attempt #13 failed with status 503. Continuing to retry for 54s Attempt #14 failed with status 503. Continuing to retry for 23s 1/1 replicas never became healthy! Healthcheck failed! ```
Deploy Logs ```sh $ ponder start Attention: Ponder collects anonymous telemetry data to identify issues and prioritize features. See https://ponder.sh/advanced/telemetry for more information. 6:06:10 AM INFO database Using Postgres database postgres.railway.internal:5432/railway (from DATABASE_PRIVATE_URL env var) /app/node_modules/pg-pool/index.js:45 Error.captureStackTrace(err); ^ Error: getaddrinfo ENOTFOUND postgres.railway.internal at /app/node_modules/pg-pool/index.js:45:11 at processTicksAndRejections (node:internal/process/task_queues:95:5) at PostgresDriver.acquireConnection (file:///app/node_modules/kysely/dist/esm/dialect/postgres/postgres-driver.js:19:24) at RuntimeDriver.acquireConnection (file:///app/node_modules/kysely/dist/esm/driver/runtime-driver.js:44:28) at DefaultConnectionProvider.provideConnection (file:///app/node_modules/kysely/dist/esm/driver/default-connection-provider.js:8:28) at DefaultQueryExecutor.executeQuery (file:///app/node_modules/kysely/dist/esm/query-executor/query-executor-base.js:34:16) at SelectQueryBuilderImpl.execute (file:///app/node_modules/kysely/dist/esm/query-builder/select-query-builder.js:293:24) at PostgresIntrospector.getSchemas (file:///app/node_modules/kysely/dist/esm/dialect/postgres/postgres-introspector.js:11:26) at Migrator.#doesSchemaExists (file:///app/node_modules/kysely/dist/esm/migration/migrator.js:303:25) at Migrator.#ensureMigrationTableSchemaExists (file:///app/node_modules/kysely/dist/esm/migration/migrator.js:219:15) { errno: -3008, code: 'ENOTFOUND', syscall: 'getaddrinfo', hostname: 'postgres.railway.internal' } Node.js v18.18.2 error: script "start" exited with code 1 (SIGHUP) $ ponder start 6:06:17 AM INFO database Using Postgres database postgres.railway.internal:5432/railway (from DATABASE_PRIVATE_URL env var) ^ Error: getaddrinfo ENOTFOUND postgres.railway.internal at /app/node_modules/pg-pool/index.js:45:11 at processTicksAndRejections (node:internal/process/task_queues:95:5) at PostgresDriver.acquireConnection (file:///app/node_modules/kysely/dist/esm/dialect/postgres/postgres-driver.js:19:24) at RuntimeDriver.acquireConnection (file:///app/node_modules/kysely/dist/esm/driver/runtime-driver.js:44:28) at DefaultConnectionProvider.provideConnection (file:///app/node_modules/kysely/dist/esm/driver/default-connection-provider.js:8:28) at DefaultQueryExecutor.executeQuery (file:///app/node_modules/kysely/dist/esm/query-executor/query-executor-base.js:34:16) at SelectQueryBuilderImpl.execute (file:///app/node_modules/kysely/dist/esm/query-builder/select-query-builder.js:293:24) at PostgresIntrospector.getSchemas (file:///app/node_modules/kysely/dist/esm/dialect/postgres/postgres-introspector.js:11:26) at Migrator.#doesSchemaExists (file:///app/node_modules/kysely/dist/esm/migration/migrator.js:303:25) at Migrator.#ensureMigrationTableSchemaExists (file:///app/node_modules/kysely/dist/esm/migration/migrator.js:219:15) { errno: -3008, code: 'ENOTFOUND', syscall: 'getaddrinfo', hostname: 'postgres.railway.internal' } Node.js v18.18.2 error: script "start" exited with code 1 (SIGHUP) $ ponder start 6:06:25 AM INFO database Using Postgres database postgres.railway.internal:5432/railway (from DATABASE_PRIVATE_URL env var) 6:06:25 AM INFO server Started listening on port 7284 6:06:26 AM INFO realtime Fetched latest block at 46521820 (network=polygonMumbai) 6:06:26 AM INFO historical Started sync with 100% cached (contract=EntryPoint 0.6 network=polygonMumbai) 6:06:26 AM INFO historical Started sync with 100% cached (contract=EntryPoint 0.7 network=polygonMumbai) 6:06:26 AM INFO historical Started sync with 100% cached (contract=BiconomyAccountFactory 2 network=polygonMumbai) 6:06:26 AM INFO historical Started sync with 100% cached (contract=BiconomyAccount2 network=polygonMumbai) 6:06:29 AM INFO realtime Synced 2 matched logs from block 46521730 (network=polygonMumbai) 6:06:29 AM INFO realtime Synced 2 matched logs from block 46521767 (network=polygonMumbai) 6:06:29 AM INFO realtime Synced 2 matched logs from block 46521777 (network=polygonMumbai) 6:06:29 AM INFO realtime Synced 2 matched logs from block 46521786 (network=polygonMumbai) 6:06:29 AM INFO realtime Synced 2 matched logs from block 46521794 (network=polygonMumbai) 6:06:29 AM INFO realtime Synced 2 matched logs from block 46521804 (network=polygonMumbai) 6:06:29 AM INFO realtime Synced 2 matched logs from block 46521806 (network=polygonMumbai) 6:06:29 AM INFO realtime Synced 2 matched logs from block 46521810 (network=polygonMumbai) 6:06:29 AM INFO realtime Synced 2 matched logs from block 46521813 (network=polygonMumbai) 6:06:29 AM INFO realtime Synced 4 matched logs from block 46521819 (network=polygonMumbai) 6:06:31 AM INFO realtime Synced 2 matched logs from block 46521822 (network=polygonMumbai) 6:06:31 AM INFO indexing Indexed 37 EntryPoint 0.6:UserOperationEvent events (chainId=80001 block=45000552 logIndex=46) 6:06:31 AM INFO indexing Indexed 1 BiconomyAccountFactory 2:AccountCreation event (chainId=80001 block=45000552 logIndex=40) 6:06:36 AM INFO historical Sync is 100% complete (contract=EntryPoint 0.6) 6:06:36 AM INFO historical Sync is 100% complete (contract=EntryPoint 0.7) 6:06:36 AM INFO historical Sync is 100% complete (contract=BiconomyAccount2) 6:06:36 AM INFO historical Sync is 100% complete (contract=BiconomyAccountFactory 2) fun(...args); ^ TypeError: args is not iterable (cannot read property undefined) at Timeout._onTimeout (file:///app/node_modules/@ponder/core/src/utils/debounce.ts:26:13) at listOnTimeout (node:internal/timers:569:17) at processTimers (node:internal/timers:512:7) Node.js v18.18.2 error: script "start" exited with code 1 (SIGHUP) $ ponder start 6:06:46 AM INFO database Using Postgres database postgres.railway.internal:5432/railway (from DATABASE_PRIVATE_URL env var) 6:06:46 AM INFO server Started listening on port 7284 6:06:47 AM INFO realtime Fetched latest block at 46521830 (network=polygonMumbai) 6:06:47 AM INFO historical Started sync with 100% cached (contract=EntryPoint 0.7 network=polygonMumbai) 6:06:47 AM INFO historical Started sync with 100% cached (contract=BiconomyAccountFactory 2 network=polygonMumbai) 6:06:47 AM INFO historical Started sync with 100% cached (contract=EntryPoint 0.6 network=polygonMumbai) 6:06:47 AM INFO historical Started sync with 100% cached (contract=BiconomyAccount2 network=polygonMumbai) 6:06:49 AM INFO historical Completed sync in 1s (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 2 matched logs from block 46521767 (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 2 matched logs from block 46521777 (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 2 matched logs from block 46521786 (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 2 matched logs from block 46521794 (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 2 matched logs from block 46521804 (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 2 matched logs from block 46521806 (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 2 matched logs from block 46521810 (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 2 matched logs from block 46521813 (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 4 matched logs from block 46521819 (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 2 matched logs from block 46521822 (network=polygonMumbai) 6:06:49 AM INFO realtime Synced 2 matched logs from block 46521829 (network=polygonMumbai) 6:07:06 AM INFO realtime Synced 2 matched logs from block 46521839 (network=polygonMumbai) 6:07:26 AM INFO realtime Synced 7 matched logs from block 46521848 (network=polygonMumbai) 6:07:48 AM INFO realtime Synced 2 matched logs from block 46521858 (network=polygonMumbai) 6:07:49 AM INFO realtime Synced 3 matched logs from block 46521859 (network=polygonMumbai) 6:08:02 AM INFO realtime Synced 3 matched logs from block 46521865 (network=polygonMumbai) 6:08:08 AM INFO realtime Synced 2 matched logs from block 46521868 (network=polygonMumbai) 6:08:24 AM INFO realtime Synced 2 matched logs from block 46521875 (network=polygonMumbai) 6:08:42 AM INFO realtime Synced 2 matched logs from block 46521884 (network=polygonMumbai) 6:08:44 AM INFO realtime Synced 2 matched logs from block 46521885 (network=polygonMumbai) 6:08:59 AM INFO realtime Synced 6 matched logs from block 46521892 (network=polygonMumbai) 6:09:05 AM INFO realtime Synced 2 matched logs from block 46521895 (network=polygonMumbai) 6:09:11 AM INFO realtime Synced 2 matched logs from block 46521898 (network=polygonMumbai) 6:09:19 AM INFO realtime Synced 2 matched logs from block 46521902 (network=polygonMumbai) 6:09:26 AM INFO realtime Synced 4 matched logs from block 46521904 (network=polygonMumbai) 6:09:27 AM INFO realtime Synced 2 matched logs from block 46521905 (network=polygonMumbai) 6:09:47 AM INFO realtime Synced 2 matched logs from block 46521915 (network=polygonMumbai) 6:10:09 AM INFO realtime Synced 2 matched logs from block 46521924 (network=polygonMumbai) 6:10:14 AM INFO realtime Synced 5 matched logs from block 46521927 (network=polygonMumbai) 6:10:16 AM INFO indexing Indexed 2000 EntryPoint 0.6:UserOperationEvent events (chainId=80001 block=45019458 logIndex=10) 6:10:18 AM INFO indexing Indexed 2000 BiconomyAccountFactory 2:AccountCreation events (chainId=80001 block=46451295 logIndex=1) 6:10:22 AM INFO indexing Indexed 3330 EntryPoint 0.6:UserOperationEvent events (chainId=80001 block=45042366 logIndex=66) 6:10:24 AM INFO indexing Indexed 1146 BiconomyAccountFactory 2:AccountCreation events (chainId=80001 block=46521122 logIndex=3) 6:10:29 AM INFO indexing Indexed 3330 EntryPoint 0.6:UserOperationEvent events (chainId=80001 block=45060870 logIndex=20) 6:10:33 AM INFO realtime Synced 2 matched logs from block 46521936 (network=polygonMumbai) 6:10:36 AM INFO indexing Indexed 4995 EntryPoint 0.6:UserOperationEvent events (chainId=80001 block=45084412 logIndex=39) 6:10:37 AM INFO realtime Synced 3 matched logs from block 46521937 (network=polygonMumbai) 6:10:40 AM INFO realtime Synced 3 matched logs from block 46521939 (network=polygonMumbai) 6:10:40 AM INFO indexing Indexed 2000 EntryPoint 0.6:UserOperationEvent events (chainId=80001 block=45097983 logIndex=20) 6:10:46 AM INFO indexing Indexed 4995 EntryPoint 0.6:UserOperationEvent events (chainId=80001 block=45127677 logIndex=22) ```

I think the ultimate root issue is Ponder restarting multiple times due to Error: getaddrinfo ENOTFOUND postgres.railway.internal which I assume is Railway's internal issue. Each subsequent restart resets the 240 second limit, and if the instance kept restarting for more than 1 minute, Railway's heath check will timeout.

Reproduction

Deploy any Ponder graph to Railway (might require a few tries).

Workaround

Redeploying failed instance usually fixes the issue

kyscott18 commented 8 months ago

I fixed a race condition in @ponder/core@0.2.15 which should resolve this issue. Thanks for reporting