timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
1.95k stars 153 forks source link

expiration, onComplete and long running tasks #286

Closed basaran closed 2 years ago

basaran commented 2 years ago

Hello, hope all is well. In my tests, and from what I could gather through the issues:

a. if a subscribed job finishes before the expiration, all is well and oncomplete is launched.

--

a. If a subscribed job passes the allotted time, it is marked as expired. b. this will initiate the oncomplete, regardless of the actual state of the job. c. if the job finishes later on, oncomplete is not launched again.

Am I correct? Thank you for your hard work.

timgit commented 2 years ago

Yes, this is correct

basaran commented 2 years ago

Thank you. What would you suggest to track jobs that overstayed their welcome or when a user wants to manually cancel a task. A scenario would be like this:

Task: Schedule a task that would go get an API or push into an API.

Scenario A: Job is started and should finish in 10 minutes. But that day, internet roads are crowded and slow. It is now 15 minutes and task is still running. Pg-Boss considered it as expired, and initiated the onComplete.

Scenario B: Job is started, things are fine. User realized he made an error and wanted to cancel that task.

To my understanding, from pg-boss point of view, execution doesn't matter that much. It's mostly to manage the pub-sub relationship. We can't do process.exit within the subscribed Fn, because pg-boss is not using fork anymore. I just wanted to ask your opinions. when you have the time.

Concern A: User who has started this task sees the job as expired but doesn't know it's still running.

Concern B: There is no way? to for the user to stop a task that was run through pg-boss.

So far, I know pg-boss hands out the job-id to the subscribed Fn:

To address this, I thought may be I can:

  1. Have the started Fn, check his own status. Start a timer or something within the Fn, and have an if statement somewhere to block execution and let the function return / resolve.

This kind of takes care of the long running task I suppose. But if the user wants to destroy the started task. It doesn't help. So to overcome that, I thought of:

  1. User forces the task to fail, with the fail().

If you could please share your thoughts, I will surely appreciate it.

P.S I also ccame across abort controller. and looking into it as well.

basaran commented 2 years ago

After a few hours of struggling with a bunch of things, it seems the best option is to fork a child process, and grab the pid to operate later if need be.

More so, this opened up another can of worms for me. If there are multiple subscribers to the same channel, I would also need to know the information about the server who picks up the task. So I can kill it from outside.

Is there a builtin mechanism for this? I used a closure meanwhile to wrap the subscription Fn:

const workerDetails = {
  ip: "1.1.1.1",
  name: "pikachu",
}

function hugMe(fn) {
    return (job) => {
        fn(job, workerDetails);
    };
}

...

await boss.subscribe("goForky", options, hugMe(goForky));

and goForky looks something like this:

async function goForky(job, worker) {
    return new Promise(async (resolve, reject) => {
        console.log(job, worker);
        const child = fork(__dirname + "/child.js");

        console.log("process launched: ", child.pid);

        const { sleep } = require("$libs/tools");

        await sleep(10000);

        console.log("sending goaway from parent");
        child.send("message");

        resolve(true);
    })
}

I suppose this way of returning the promise would give pg-boss the opportunity to await and do onComplete after the resolve?

P.S I'm going to write a devto article once I figure most of this outl.

basaran commented 2 years ago

Hi, are we supposed to manually set the job as complete inside the handler Fn?

Update:

I got it, my closure was missing the return.

timgit commented 2 years ago

There's a lot to reply to here, so I think a good baseline to start with answering this question: "how would you fetch a job and mark it as completed manually?". This will bypass a lot of details regarding the job polling (subscribe()) that may add more complexity to your questions. This means you would first focus on fetch() and complete(), for example.

I also refer a lot of questions like this to AWS SQS docs, since they address idempotent job handling use cases pretty well.

basaran commented 2 years ago

I know, I'm sorry things got out of hand at some point :) At the moment, I "think" I have a fairly good understanding of what's revolving around expiration and onComplete, and use of promises in the handlers. Looking at the source code was very helpful, it's well put together. Thank you again for all your work.