timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
2.04k stars 157 forks source link

Job not expiring #250

Closed EvertEt closed 3 years ago

EvertEt commented 3 years ago

I noticed a job stuck in our pg-boss table. I believe it should have been expired but it was not restarted. Before I deleted the job from the table, I did copy the row:

INSERT INTO pgboss.job (id, name, priority, data, state, retrylimit, retrycount, retrydelay, retrybackoff, startafter, startedon, singletonkey, singletonon, expirein, createdon, completedon, keepuntil) VALUES ('5edc0b40-c47c-11eb-9c21-9f88d7e174f7', 'checkRunOnHold', 10, '{"jobId": 9962, "runId": 229335}', 'active', 0, 0, 0, false, '2021-06-03 15:00:01.149897', '2021-06-03 15:00:03.673934', 'checkRunOnHold(229335)', null, '0 years 0 mons 0 days 0 hours 15 mins 0.00 secs', '2021-06-03 15:00:01.149897', null, '2021-07-03 15:00:01.149897');

Did we misconfigure anything or do you know what might have caused this job to stay in active? Other jobs were getting created and picked up well at the same time.

timgit commented 3 years ago

As long as a job is in active state, the maintenance commands should have moved this to expired. Have you customized your maintenance configuration?

EvertEt commented 3 years ago

Some more details about the configuration:

timgit commented 3 years ago

By default, maintenance runs every 2 minutes. You can verify this by querying the .version table. For example, the following will return a column named maintained_on.

select * from pgboss.version

If the timestamp in maintained_on is current and the job is still active after it should have been expired, something is wrong. In your original post, you didn't include the current time, so the job data alone doesn't indicate if it should be expired or not.

The logic for expiration in the expiration command is the following.

...
WHERE state = 'active'
  AND (startedOn + expireIn) < now()
...
EvertEt commented 3 years ago

I forgot to mention some important details indeed.

timgit commented 3 years ago

Please upgrade to a more recent version in order to rule out a hung maintenance monitoring issue fixed since 4.1.0

EvertEt commented 3 years ago

It is indeed quite old. Can we stay within v4 or should we upgrade to the latest v6?

I will close the issue as it seemed to be a one-time thing so far, and indeed on a quite old version.

Thanks for looking into it!

timgit commented 3 years ago

You can stay on v4 if you want, but I don't see the harm in upgrading since you shouldn't lose any data.