Closed mepc36 closed 3 months ago
Also I'm running this with a serverless db in AWS, so maybe when the server on which this db is running gets closed down then pg-boss throws this error?
Yes, I think you should update to latest and re-validate
Will do, thank you @timgit!
Love the lib btw
Edit: The shim suggested in #365 did not work, but I also think their problem was slightly different from mine.
Looks like it's happening in 9.0.0 too:
node:internal/errors:465
ErrorCaptureStackTrace(err);
^
Error [ERR_UNHANDLED_ERROR]: Unhandled error. ({
message: 'Connection terminated unexpectedly (Queue: __state__completed__onCompleteFtw, Worker: e5e396aa-7aaa-47b2-8c4c-200fa791ca22)',
stack: 'Error: Connection terminated unexpectedly (Queue: __state__completed__onCompleteFtw, Worker: e5e396aa-7aaa-47b2-8c4c-200fa791ca22)\n' +
' at Connection.<anonymous> (/Users/martinconnor/Desktop/server-side-comm/node_modules/pg/lib/client.js:132:73)\n' +
' at Object.onceWrapper (node:events:641:28)\n' +
' at Connection.emit (node:events:527:28)\n' +
' at Socket.<anonymous> (/Users/martinconnor/Desktop/server-side-comm/node_modules/pg/lib/connection.js:62:12)\n' +
' at Socket.emit (node:events:527:28)\n' +
' at TCP.<anonymous> (node:net:709:12)',
queue: '__state__completed__onCompleteFtw',
worker: 'e5e396aa-7aaa-47b2-8c4c-200fa791ca22'
})
at new NodeError (node:internal/errors:372:5)
at PgBoss.emit (node:events:516:17)
at Manager.<anonymous> (/Users/martinconnor/Desktop/server-side-comm/node_modules/pg-boss/src/index.js:88:37)
at Manager.emit (node:events:527:28)
at Worker.onError (/Users/martinconnor/Desktop/server-side-comm/node_modules/pg-boss/src/manager.js:256:12)
at Worker.start (/Users/martinconnor/Desktop/server-side-comm/node_modules/pg-boss/src/worker.js:70:14)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (node:internal/process/task_queues:96:5) {
code: 'ERR_UNHANDLED_ERROR',
context: {
message: 'Connection terminated unexpectedly (Queue: __state__completed__onCompleteFtw, Worker: e5e396aa-7aaa-47b2-8c4c-200fa791ca22)',
stack: 'Error: Connection terminated unexpectedly (Queue: __state__completed__onCompleteFtw, Worker: e5e396aa-7aaa-47b2-8c4c-200fa791ca22)\n' +
' at Connection.<anonymous> (/Users/martinconnor/Desktop/server-side-comm/node_modules/pg/lib/client.js:132:73)\n' +
' at Object.onceWrapper (node:events:641:28)\n' +
' at Connection.emit (node:events:527:28)\n' +
' at Socket.<anonymous> (/Users/martinconnor/Desktop/server-side-comm/node_modules/pg/lib/connection.js:62:12)\n' +
' at Socket.emit (node:events:527:28)\n' +
' at TCP.<anonymous> (node:net:709:12)',
queue: '__state__completed__onCompleteFtw',
worker: 'e5e396aa-7aaa-47b2-8c4c-200fa791ca22'
}
}
That error happened I left this script running:
require("dotenv").config();
const PgBoss = require('pg-boss');
const pgBossOptions = {
host: process.env.PGHOST,
database: process.env.PGDATABASE,
user: process.env.PGUSER,
password: process.env.PGPASSWORD,
deleteAfterDays: 365000
};
const boss = new PgBoss(pgBossOptions);
;
(async () => {
// 9.0.0:
await boss.start();
const jobName = 'onCompleteFtw'
boss.onComplete(jobName, job => {
console.log('onComplete > job:', job)
boss.complete(job.id)
})
const jobId = await boss.send(jobName, {}, { onComplete: true })
const job = await boss.fetch(jobName)
await boss.complete(job.id)
})()
I've been calling .start and .subscribe in the app.listen callback:
app.listen(3039, async () => {
console.log('listening on 3039')
await boss.start();
await boss.subscribe(`demucs-${process.env.APP_ENVIRONMENT}`, separateAudio);
})
Where do I call boss.stop() then, whenever a job has completed? I think this bug is basically my fault, I'm not calling boss.stop()
anywhere.
Alright now I'm calling boss.start(0 right before creating and starting a job, and boss.stop at the end of the .onCompelte
callback, but I'm now getting this error about destructuring properties:
/Users/martinconnor/Desktop/rapbot-mobile/services/live/demucs-with-db-and-s3/node_modules/pg-boss/src/boss.js:81
const { secondsAgo } = await this.getMaintenanceTime()
^
TypeError: Cannot destructure property 'secondsAgo' of '(intermediate value)' as it is undefined.
at Timeout._onTimeout (/Users/martinconnor/Desktop/rapbot-mobile/services/live/demucs-with-db-and-s3/node_modules/pg-boss/src/boss.js:81:15)
A couple things
Hey Tim, thanks for the help. I really appreciate it. As for your points:
1.) I do have an error handler. You can see it where I am instantiating the pg-boss instance, sorry for not posting this before:
const PgBoss = require('pg-boss');
const pgBossOptions = {
host: process.env.PGHOST,
database: process.env.DATABASE,
user: process.env.PGUSER,
password: process.env.PGPASSWORD,
deleteAfterDays: 365000
};
const boss = new PgBoss(pgBossOptions);
boss.on('error', error => {
throw error
});
I used to start pg-boss in the app.listen callback, but now I'm doing it in the route itself:
app.post('/separate', async (req, res) => {
try {
await boss.start();
const jobId = await scheduleJob(`demucs-${process.env.APP_ENVIRONMENT}`, req.body)
res.status(201).send({ jobId })
} catch (error) {
res.status(500).send(error)
}
})
Here is how I'm scheduling the job. It's also where I also stop pg-boss, inside the .onComplete callback:
const scheduleJob = async (queue, data, startAfter = 0, retryAttempts = 0, runJustOnce = true) => {
const jobData = {
retryAttempts,
runJustOnce,
data
}
boss.onComplete(queue, async job => {
try {
const demucsResponse = await separateAudio(job)
const triggerResponse = await pusher.trigger(queue, 'job-event', {
type: 'completed',
demucsResponse
});
await boss.stop()
return demucsResponse
} catch (error) {
throw error
}
})
const jobOptions = {
startAfter,
expireInMinutes: 10,
retentionMinutes: 10,
onComplete: true
}
const jobId = await boss.send(queue, jobData, jobOptions)
const job = await boss.fetch(queue)
await boss.complete(job.id)
return job.id
}
Although this arrangement of calling .stop() means I don't get the Connection terminated unexpectedly
error anymore, it throws this new Cannot destructure property 'secondsAgo'
error.
So I'm in a catch-22: when I remove boss.stop(), I get the connection-terminated error, and when I put it in, I get the cannot-destructure error. I can't help but feel that I"m doing something wrong as far as what I'm calling where.
The question boils down to this: in an Express server app, where does boss.start(), boss.send(), and boss.work() get called when a server is creating a job through an endpoint; running that job asynchronously (i.e., the route returns before the job's work is done); and then emitting a job-is-finished event? Any thoughts?
I treat pg-boss in my API like a connection pool. It's a long-lived object that is created by a service that stays running for the entire uptime of the application
Thanks Tim, that's helpful info. I really feel like the problem is with some db config settings then. I have an RDS instance running on AWS. I will investigate the timeouts there and then report back.
Turns out this is a known issue with node-postgres, which pg-boss appears to use under the hood:
https://github.com/brianc/node-postgres/issues/2112
Takeaways: probably has to do with timeout settings of infrastructure provider. Might need 3rd party software to fix; see more here. Here's a better comment here.
No easy solutions there, apparently answer might involve a db change. I will report back with fix.
I am facing the same destructuring error mentioned above - but I have no way to reproduce it. It just seems to randomly crash my Blitz.js+Postgres app. Any clues on how to reproduce or any workarounds?
@KPouianou sorry, not yet. What is your db running on? AWS?
HI @timgit, could you please confirm that pg-boss does not allow the keepAlive
option to be set when configuring pg-boss' connection to the database via node-postgres?
Docs on pg-boss' connection options: https://github.com/timgit/pg-boss/blob/master/docs/readme.md#newoptions
node-postgres maintainer's note about using keepAlive
to send TCP keep_alive packets: https://github.com/brianc/node-postgres/issues/2362#issuecomment-704286512
pg-boss' instantiation of DB connex: https://github.com/timgit/pg-boss/blob/master/src/db.js#L14
Possible fix from SOF: https://stackoverflow.com/a/65608187
@KPouianou sorry, not yet. What is your db running on? AWS?
It is a Postgres running on AWS (RDS)
If this would be resolved by simply passing any additional configuration through to node-pg, that seems like a good solution. Did you already try using keepAlive
?
Possible fix from SOF: https://stackoverflow.com/a/65608187
This did not seem to fix the issue, but it seems to be occurring less often. Do you think increasing the tcp_keepalives_idle beyond 60 might help?
I don't run pg in RDS so any testing and feedback you can do would be appreciated.
I don't run pg in RDS so any testing and feedback you can do would be appreciated.
Will do!
Do you think increasing the tcp_keepalives_idle beyond 60 might help?
Hoping to test that this week, I'll let you know.
Going to still keep this open. I do not have console access to the db in question currently, although that's in the process of changing. Will test the new config when that happens. Thanks!
For information, same problem is happening on a cloud provider named CleverCloud. They are doing network maintenance and each "glitch" cause this exception crashing the app.
I'm also seeing that destructuring issue when calling stop()
. I'm having similar issues using PgBoss in a few serverless and long-lived server process. Is there a proper way to initialize and cleanup PgBoss
instances so connections are returned back to the pool and to be as lightweight as possible? In particular, when using PgBoss against a pooled connection manager on the server I tend to see more of these issues come up. I've also had a few issues where the pooler threw "too many clients" errors, indicating to me my PgBoss instances are just stacking up and never cleaning themselves out.
When I send new messages, I generally do:
const boss = new PgBoss(...)
boss.start();
boss.send()
Should I also be calling stop()
?
Please try again with v10 that was recently merged and reopen if it's not
I keep getting this error when the connection is kept open for a longtime to a serverless db I'm running pg-boss with:
Here's how I start the server:
Here's how I create the job:
It may be related to #365. I will try the solution in that thread mentioned here now. Also I'm running pg-boss version 6.2.2, and apparently a fix might have been shipped in the newer 9.0.0 version. Will try that and report back. Thanks!