graphile / worker

High performance Node.js/PostgreSQL job queue (also suitable for getting jobs generated by PostgreSQL triggers/functions out into a different work queue)
http://worker.graphile.org/
MIT License
1.84k stars 103 forks source link

Should worker process exit when the worker commits seppuku? #501

Open psugihara opened 2 weeks ago

psugihara commented 2 weeks ago

Summary

I have a single graphile-worker process that processes a small queue of mostly low priority web scraper stuff with occasional high priority items (emails).

I'm seeing my worker die without the process exiting. There's a supervisor that is supposed to restart if it fails, but the process never exits so the supervisor doesn't get triggered. I can program around this by parsing the log output (e.g. look for "seppuku" and restart), but I'm guessing I'm just missing something in my setup. Is there a setting that will make the process exit when it commits seppuku?

Additional context

Node v20.15.1

Here's the script I run on the server to start the worker: bunx --bun graphile-worker --cleanup DELETE_PERMAFAILED_JOBS,GC_TASK_IDENTIFIERS,GC_JOB_QUEUES && bunx --bun graphile-worker --no-prepared-statements

Here's an example of the error I see when it stops processing without exiting the process:

[job(worker-6f1b0ec5b9b6945331: enrich-job-posts{39050})] ERROR: [scrapeJob] error scraping job data: Request failed with status code 429

[worker(worker-6f1b0ec5b9b6945331)] ERROR: Failed task 39050 (enrich-job-posts, 11853.30ms, attempt 2 of 25) with error 'Request failed with status code 429':

  Error

      at new AxiosError (/opt/render/project/src/node_modules/.pnpm/axios@1.7.4/node_modules/axios/lib/core/AxiosError.js:21:4)

      at settle (/opt/render/project/src/node_modules/.pnpm/axios@1.7.4/node_modules/axios/lib/core/settle.js:2)

      at handleStreamEnd (/opt/render/project/src/node_modules/.pnpm/axios@1.7.4/node_modules/axios/lib/adapters/http.js:599:9)

      at endReadableNT (native)

      at processTicksAndRejections (native)

      at /opt/render/project/src/node_modules/.pnpm/axios@1.7.4/node_modules/axios/lib/core/Axios.js:48:12

      at asyncFunctionResume (native)

      at promiseReactionJobWithoutPromiseUnwrapAsyncContext (native)

      at promiseReactionJob (native)

      at processTicksAndRejections (native)

[worker(worker-6f1b0ec5b9b6945331)] ERROR: Failed to release job '39050' after failure 'Request failed with status code 429'; committing seppuku

Failed to connect

[core] ERROR: Worker exited, but pool is in continuous mode, is active, and is not shutting down... Did something go wrong?

41 |     rej = reject

42 |   }).catch((err) => {

43 |     // replace the stack trace that leads to `TCP.onStreamRead` with one that leads back to the

44 |     // application that created the query

45 |     Error.captureStackTrace(err)

46 |     throw err

       ^

ECONNREFUSED: Failed to connect

 syscall: "connect"

      at object.<anonymous> (/opt/render/project/src/node_modules/.pnpm/pg-pool@3.6.2_pg@8.12.0/node_modules/pg-pool/index.js:46:3)

      at promiseReactionJob (native:1:1)

      at processTicksAndRejections (native:1:1)

      at /opt/render/project/src/node_modules/.pnpm/graphile-worker@0.16.6_typescript@5.6.3/node_modules/graphile-worker/dist/main.js:610:13

[core] ERROR: Worker exited with error: ECONNREFUSED: Failed to connect
benjie commented 2 weeks ago

What version of worker?

psugihara commented 2 weeks ago

graphile-worker@0.16.6

benjie commented 2 weeks ago

Can you reproduce the issue using Node rather than bun?

psugihara commented 2 weeks ago

I haven't tried yet or done much debugging other than confirming that the process is still around. I can try to put together a minimal repro at some point. For clarification, the process is supposed to die after the seppuku message?

Thanks for the quick help (and this amazing tool)!

benjie commented 2 weeks ago

It doesn’t explicitly exit the process, it just stops running the worker; that should mean everything then shuts down relatively cleanly and then the process should exit.