Closed elmigranto closed 4 years ago
If I am reading everything correctly (instead of trying to remember things from last time I dug in :), just await pgboss.start(); await pgboss.stop()
would do the trick. Looks like supervise
(with those upkeep queries) runs right away!
Not sure if it's meant to be public API (not in the usage docs atm), but wondering if that could be useful to expose?..
Looks like
supervise
(with those upkeep queries) runs right away!
Though that promise does not seem to get returned, so in order to be sure it is done, maybe I can provide some kind of wrapper on top of regular executeSql
… OTOH, I think I could just run purge
/ archive
/ expire
directly. What's their status on being "Official Public API"? :)
🤔
I think exporting the monitoring funcs from boss.js directly would be ideal for your use case. You'd still need to be concerned about running them concurrently, so extra caution would be warranted.
I think exporting the monitoring funcs from boss.js directly would be ideal for your use case.
I agree, yeah, and decided to call Boss#archive()
, Boss#expire()
and Boss#purge()
manually. I've also found Boss#countStates()
useful to call myself instead of on timer. In fact, I ended up not using any of the Boss
methods (except explicitly) this time :)
Would a PR exporting them directly on PgBoss
instance would be a welcome addition, or accessing PgBoss#boss#<method>
is okay? In any case, would you say treating either as Public API is a good idea on my part? (Those are not explicitly documented, but, say, event names are.)
Yes, a PR is welcome. This would likely be just like how the manager api is promoted, right?
I would think so, yeah. promoteFunction
is already there, so makes sense to use that. (I think it would be helpful to keep a map with things already promoted in there, so we don't accidentally overwrite something and maybe have some kind of check to not export privates like if name starts with underscore if you do that kind of thing, but other than that, yeah, absolutely!)
Having a failing test would be better. ;)
I came here with the same question as we have some auto-scaling instances and would prefer to avoid having duplicative pgboss.start()
s and monitoring queries running.
What would happen if pgboss.connect()
was called (and used to create subscriptions and enqueue jobs) before pgboss.start()
?
The only potential race condition problem in this setup is when you decide to upgrade pg-boss to a new version which contains an auto-migrated schema change. If you try and connect()
before start()
has had a chance to migrate the database, connect()
will bail out with a version mismatch error.
If we tried the recommendation for calling await pgboss.start(); await pgboss.stop()
first, would that mitigate that race condition?
I'm thinking each instance could call this (to ensure upgrades happen) and then only a single instance would be responsible for calling the "real" await pgboss.start();
. I can't guarantee the order these instances spin up though.
What would you recommend for dealing with this?
I would recommend designating a supervisor process responsible for monitoring pg-boss expiration and archiving operations via start()
(the "real" one as you mentioned). You should feel free to have any number of instances to use connect()
without worrying about if and when start()
is called. These are not dependent on each other.
When a new version of pg-boss is released which involves a schema change, you should stop all instances, run the supervisor with start()
, wait until it is finished upgrading, then patch and restart all other instances with connect()
.
3.2.1 has the boss management functions (shown below) exported in the root module now. You would use these with connect()
, not start()
.
expire()
archive()
purge()
Can you elaborate (or document) how these functions are intended to work? At first glance in the code it seems like they don't provide much benefit for managing them yourself vs letting the lib do supervise on start
?
Ultimately, what I'd like to be able to do is define separate archive retention configurations on a queue-by-queue basis. So I have some queues which I don't really need any archive for... but then I have other ones that are more important that I want to keep around for debugging in the archive over a week or so.
Any recommendations on an approach that might work there? I'm also happy to open a PR if you can point me in the right direction to add this kind of support (if it sounds like something you'd like to include).
I've made recent changes to the maintenance operations in 4.0 (currently released in beta) which should resolve what @elmigranto originally requested here, where multiple master nodes are started at different times and you don't want to worry about which instance ends up running monitoring commands. The only remaining race condition is for schema migrations, which I'm in the process of limiting concurrency to 1 instance at a time.
In regards to retention, I was thinking we could add a new config option (ttl, pg internal, date) on publish()
which we could use in place of the default timestamp used for the retention policy. This would allow some jobs to survive longer in the archive. Does that sound like it would resolve your archive case?
You're suggesting adding a new column to the job and archive table that would customize retention on a per-job basis? I think that would solve my use case.
I was thinking in terms of queues/topics, but setting it in the job configuration on publish makes sense too.
A pg interval sounds nice, but would require two columns to interpret (archivedon + retention), unless the internal was used on insert to create a date value?
Yes, I think it would end up being a calculated value that would result in a timestamp column to use instead of archivedOn.
The primary reason I don’t want to make anything queue/topic-based is because they are all virtual and I would have to introduce a new state persistence mechanism to track it, along with its own archive and retention policies.
Also, I published a 4.0.0-beta2 release with multi-master support for schema migrations, which should finally address all the race conditions and complexity involved with running multiple instances simultaneously.
I am looking into integrating this awesome lib (thanks, Tim and everyone involved!) into yet another project, but have some concerns on a way I'm planning to setup table monitoring.
The Problem
No good way to select master server which would be running
pgboss.start()
. All our instances are the same and we would prefer to keep it that way. Obviously, running N-1 "extra" monitoring queries and everything related to that is not ideal, and coming up with "master-selecting" protocol and monitoring that makes me dizzy :)Solution (?)
Have a periodic cronjob fire via internal HTTP call that's guaranteed to hit just one node, run
pgboss.start()
on it, wait for the thing to do its job andpgboss.stop()
it after.Concerns
setTimeout(someMinutes, guessItIsProbablyDoneByNow)
? My understanding is thatawait pgboss.start()
only takes care of schema?pgboss.connect()
would take care of all the actual scheduling/processing which is good to go on all the nodes at the same time without any additional coordination?Once again, Tim, thanks for working on pg-boss and making it available to everyone ♥️ @timgit