3.0 beta - request for comments

timgit commented 6 years ago

Hey there! I just published a beta for 3.0.0 to npm and I'd like your feedback here or even on the PR #78 if you have specific concerns.

I haven't written a migration for this version yet, so don't start this up against any existing instances, as things will likely not go very well. 😀

See the change log for details.

timclipsham commented 6 years ago

Thanks Tim. I just took a quick look over the changelog – at a high level sounds great. I'll take a more detailed look through this week.

pyrossh commented 6 years ago

@timgit I just saw that you changed Switched jsonb type in job table to json. Any particular reason for this?

timclipsham commented 6 years ago

@timgit I've taken a look through and converted our app in dev to use pgboss v3. Ticks a lot of boxes now and the failed retryBackoff is a nice bonus! We'll ship it once the database migrations are written. Questions/thoughts below:

Having a resolved promise mark the job as done is great. We were already wrapping our job handlers to do this before, so now it has simplified our code
Following @pyros2097 comment, jsonb seems more practical than json, as then we can easily query the tables using the jsonb operators without having to first cast every row to ::jsonb. I'm sure you had a reason though, so keen to understand your perspective

Questions

Q: Are someJobName__state__completed__state__completed records expected to appear in the jobs table after an already completed job has exceeded the expiration time? (I think this happens when the someJobName__state__completed job records themselves expire) UPDATE: This was caused by my onComplete handlers not returning a resolved promise, although the same seems to happen if a onComplete handler fails
Q: Only one someJobName__state__completed job is picked up and run every newJobCheckInterval. This means if newJobCheckInterval: 1, and if I log 50 jobs that quickly complete, it will take at minimum 50 seconds to process the onComplete handlers. Is it possible to configure this?
Q: Is there a way to know what retry count a job is up to when the handler is called? (primarily wanted it to improve logging)

Issues

Bug: If retryBackoff: true and retryLimit: 0 is set, it will still retry it once (as if retryLimit was set to 1)
Existing issue: I haven't 100% confirmed this, but I think if the archive table has enough records, it slows the init time of our app because the Promise returned by boss.start() won't resolve until 1 round of archive/purge watchers have completed (a full table scan of archive would be required to check if there's anything to delete, especially if there isn't an index on archivedOn).

Hope the feedback helps! 👍

timgit commented 6 years ago

Tim, thanks for the feedback. 😁

json vs. jsonb

I had an issue opened (#53) questioning the usage of jsonb since pg-boss doesn't doesn't require it. An advantage is what you mention: you can run arbitrary queries against the job table. That may also be considered a disadvantage if the querying load interferes with queue operations. Although, if you're going to go through the trouble of casting..... it reallly defeats that purpose. In summary, I don't have strong feelings about it. I don't personally take advantage of querying the data column via jsonb, but I originally built it that way just in case because "maybe someone would want to do that". So, you would be that someone, and I would argue that your needs would probably outweigh the needs of those who would like milk every ounce of performance gain acquired by switching over to json instead.

extra state completion jobs

Noted. That's not intended. Good catch

only 1 state completion job allowed

That's not intended. You should be able to use the exact same config that subscribe() allows, batching and all. Are you using teamSize or batchSize?

Adding retry counts to completion jobs

Good idea. I think I'll also add in the timestamps as well

`retryBackoff` default

What's your use case for setting retryBackoff to true but then setting retryLimit to 0? That combination of options isn't valid. I decided to set the retryLimit to 1 if the backoff option was set just to simplify the config.

`start()` blocking on initial housekeeping

I think you're right that housekeeping operations should not block the initial promise resolution on start. I'll switch this over to async. Good catch on the archive table and the lack of indexes as well.

timclipsham commented 6 years ago

Thanks Tim! (it continues to feel like I'm speaking to myself) 😆

...jsonb...

At this stage we don't query the queue tables within our app, but occasionally do manually when looking into an issue. Looks like the performance cost of jsonb is minimal though.

That's not intended. You should be able to use the exact same config that subscribe() allows, batching and all. Are you using teamSize or batchSize?

Currently we're using teamSize, as it meant minimal changes at this stage to switch from v2 to v3. Eventually we'll switch to batch. We're passing teamSize into subscribe(...), however we weren't passing any configuration into onComplete – so it looks like it might be my mistake 🤦‍♂️ – I'll confirm this.

retryBackoff default

This came up because we exposed retry to be configured via an environment variable so we can adjust it, however regardless of it's value we wanted the retryBackoff option enforced. It's not likely we'll ever have 0 retries configured, so this isn't important. I thought I'd raise it just in case it wasn't intentional.

Some other questions/thoughts I've had since:

Re-running jobs that have failed

Consider the situation where something goes wrong and some jobs fail completely, exceeding the retry limit, and have since moved into the archive. What's the easiest way once the larger problem is resolved to re-run those jobs? I guess it's worth mentioning as well that all completed jobs are treated equal. If successful jobs were archived more frequently than failed that could make it easier to re-run them again (manually), by updating their state value.

Capturing job errors in `response`

Currently if a promise is rejected due to an error raised (e.g. throw new Error("message")), this data isn't properly stored in the queue's completion data. Currently we're working around this in our handler wrapper (which also does some other things, such as New Relic instrumentation), by having our own .catch(...), serializing if instanceof Error, and then re-rejecting it.

Reporting queue performance into AWS CloudWatch

We aren't doing this yet, but intend on using their API + the monitor-states event to report the state of the queue for monitoring/alerting/dashboards etc. I thought I'd ask if you've done the same and if you have any tips or guidance here.

timgit commented 6 years ago

Re-running jobs that have failed

Perhaps a republish(id) that attempts to find a completed job in either the job or the archive table?

Capturing job errors in response

I'll take a look at this. You're wanting the stack as well I assume.

Reporting queue performance into AWS CloudWatch

I think you're hunting for ideas around "how do I know when things aren't healthy", and I guess this would need to be a combination of "what queue is this" and some sort of trend analysis to determine if things are improving vs. getting worse. I don't currently have any interesting metrics or heuristics to share in this regard, but I'm interested in what you come up with .

jr14marquez commented 6 years ago

@timgit Just discovered pg-boss and i love it. Tried out version 3 and was wondering if it's possible to have monitor-states pass back the data? Its a great overall status but i'd like to display on a web page what is in the queue.

timgit commented 6 years ago

This is one advantage of having the queue as a table. Feel free to issue arbitrary queries against both the job and archive tables. Use your best judgment to decide how many queries to run against it, however, as read activity will have some impact on performance.

jr14marquez commented 6 years ago

@timgit Thanks! To do that would it be best to create my own pool and pass that to the pg-boss instance and use it to query what i need? If so do you have an example of doing this?

Also, not sure if this is intended but should the singletonKey example below enter 123 into the database or the whole object? Right now i noticed it enters {singltonKey:'123'} into the database. boss.publish('my-job', {}, {singletonKey: '123'}) // resolves a jobId

timgit commented 6 years ago

@jr14marquez, you can just use the pg module directly. pg-boss doesn't have an arbitrary query api.

I'm not sure what you mean by singletonKey. There's a text column in the job queue table specifically for this value, if that helps.

timgit commented 6 years ago

@timclipsham I just published 3.0.0-beta4 which should address most the major issues with the last beta that you pointed out.

I also added a migration to this release, so it's kind of a RC in that regard.

jr14marquez commented 6 years ago

@timgit My mistake on the singleonKey. Misunderstood. Also, instead of using pg-module i just used boss.db.executeSql(query) to get what i needed out of the database. That worked perfectly.

timgit / pg-boss