timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
2.05k stars 157 forks source link

3.0 beta - request for comments #79

Closed timgit closed 6 years ago

timgit commented 6 years ago

Hey there! I just published a beta for 3.0.0 to npm and I'd like your feedback here or even on the PR #78 if you have specific concerns.

I haven't written a migration for this version yet, so don't start this up against any existing instances, as things will likely not go very well. πŸ˜€

See the change log for details.

timclipsham commented 6 years ago

Thanks Tim. I just took a quick look over the changelog – at a high level sounds great. I'll take a more detailed look through this week.

pyrossh commented 6 years ago

@timgit I just saw that you changed Switched jsonb type in job table to json. Any particular reason for this?

timclipsham commented 6 years ago

@timgit I've taken a look through and converted our app in dev to use pgboss v3. Ticks a lot of boxes now and the failed retryBackoff is a nice bonus! We'll ship it once the database migrations are written. Questions/thoughts below:

Questions

Issues

Hope the feedback helps! πŸ‘

timgit commented 6 years ago

Tim, thanks for the feedback. 😁

json vs. jsonb

I had an issue opened (#53) questioning the usage of jsonb since pg-boss doesn't doesn't require it. An advantage is what you mention: you can run arbitrary queries against the job table. That may also be considered a disadvantage if the querying load interferes with queue operations. Although, if you're going to go through the trouble of casting..... it reallly defeats that purpose. In summary, I don't have strong feelings about it. I don't personally take advantage of querying the data column via jsonb, but I originally built it that way just in case because "maybe someone would want to do that". So, you would be that someone, and I would argue that your needs would probably outweigh the needs of those who would like milk every ounce of performance gain acquired by switching over to json instead.

extra state completion jobs

Noted. That's not intended. Good catch

only 1 state completion job allowed

That's not intended. You should be able to use the exact same config that subscribe() allows, batching and all. Are you using teamSize or batchSize?

Adding retry counts to completion jobs

Good idea. I think I'll also add in the timestamps as well

retryBackoff default

What's your use case for setting retryBackoff to true but then setting retryLimit to 0? That combination of options isn't valid. I decided to set the retryLimit to 1 if the backoff option was set just to simplify the config.

start() blocking on initial housekeeping

I think you're right that housekeeping operations should not block the initial promise resolution on start. I'll switch this over to async. Good catch on the archive table and the lack of indexes as well.

timclipsham commented 6 years ago

Thanks Tim! (it continues to feel like I'm speaking to myself) πŸ˜†

...jsonb...

At this stage we don't query the queue tables within our app, but occasionally do manually when looking into an issue. Looks like the performance cost of jsonb is minimal though.

That's not intended. You should be able to use the exact same config that subscribe() allows, batching and all. Are you using teamSize or batchSize?

Currently we're using teamSize, as it meant minimal changes at this stage to switch from v2 to v3. Eventually we'll switch to batch. We're passing teamSize into subscribe(...), however we weren't passing any configuration into onComplete – so it looks like it might be my mistake πŸ€¦β€β™‚οΈ – I'll confirm this.

retryBackoff default

This came up because we exposed retry to be configured via an environment variable so we can adjust it, however regardless of it's value we wanted the retryBackoff option enforced. It's not likely we'll ever have 0 retries configured, so this isn't important. IΒ thought I'd raise it just in case it wasn't intentional.


Some other questions/thoughts I've had since:

Re-running jobs that have failed

Consider the situation where something goes wrong and some jobs fail completely, exceeding the retry limit, and have since moved into the archive. What's the easiest way once the larger problem is resolved to re-run those jobs? I guess it's worth mentioning as well that all completed jobs are treated equal. If successful jobs were archived more frequently than failed that could make it easier to re-run them again (manually), by updating their state value.

Capturing job errors in response

Currently if a promise is rejected due to an error raised (e.g. throw new Error("message")), this data isn't properly stored in the queue's completion data. Currently we're working around this in our handler wrapper (which also does some other things, such as New Relic instrumentation), by having our own .catch(...), serializing if instanceof Error, and then re-rejecting it.

Reporting queue performance into AWS CloudWatch

We aren't doing this yet, but intend on using their API + the monitor-states event to report the state of the queue for monitoring/alerting/dashboards etc. I thought I'd ask if you've done the same and if you have any tips or guidance here.

timgit commented 6 years ago

Re-running jobs that have failed

Perhaps a republish(id) that attempts to find a completed job in either the job or the archive table?

Capturing job errors in response

I'll take a look at this. You're wanting the stack as well I assume.

Reporting queue performance into AWS CloudWatch

I think you're hunting for ideas around "how do I know when things aren't healthy", and I guess this would need to be a combination of "what queue is this" and some sort of trend analysis to determine if things are improving vs. getting worse. I don't currently have any interesting metrics or heuristics to share in this regard, but I'm interested in what you come up with .

jr14marquez commented 6 years ago

@timgit Just discovered pg-boss and i love it. Tried out version 3 and was wondering if it's possible to have monitor-states pass back the data? Its a great overall status but i'd like to display on a web page what is in the queue.

timgit commented 6 years ago

This is one advantage of having the queue as a table. Feel free to issue arbitrary queries against both the job and archive tables. Use your best judgment to decide how many queries to run against it, however, as read activity will have some impact on performance.

jr14marquez commented 6 years ago

@timgit Thanks! To do that would it be best to create my own pool and pass that to the pg-boss instance and use it to query what i need? If so do you have an example of doing this?

Also, not sure if this is intended but should the singletonKey example below enter 123 into the database or the whole object? Right now i noticed it enters {singltonKey:'123'} into the database. boss.publish('my-job', {}, {singletonKey: '123'}) // resolves a jobId

timgit commented 6 years ago

@jr14marquez, you can just use the pg module directly. pg-boss doesn't have an arbitrary query api.

I'm not sure what you mean by singletonKey. There's a text column in the job queue table specifically for this value, if that helps.

timgit commented 6 years ago

@timclipsham I just published 3.0.0-beta4 which should address most the major issues with the last beta that you pointed out.

I also added a migration to this release, so it's kind of a RC in that regard.

jr14marquez commented 6 years ago

@timgit My mistake on the singleonKey. Misunderstood. Also, instead of using pg-module i just used boss.db.executeSql(query) to get what i needed out of the database. That worked perfectly.