timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
2.15k stars 160 forks source link

How to partially fail a batch? #474

Closed danilofuchs closed 2 months ago

danilofuchs commented 2 months ago

I'm implementing batch support in my application, and I would like to fail only 1 job of the batch, using Promise.allSettled:


    await this.boss.work<PgBossJobData<T>>(
      queueName,
      {
        includeMetadata: true,
        pollingIntervalSeconds,
        batchSize,
      },
      async (jobs) => {
        const results = await Promise.allSettled(
          jobs.map((job) => this.runJob(job, queueName, jobHandler)),
        );

        const errors = results.filter((result) => result.status === "rejected");

        const successes = results.filter((result) => result.status === "fulfilled");
      },
    );

I could use fetch, complete, fail myself, but then I loose all the Worker functionality and have to manage polling myself

Would be nice if work had an API for that, for example:

    await this.boss.work<PgBossJobData<T>>(
      queueName,
      {
        includeMetadata: true,
        pollingIntervalSeconds,
        batchSize,
        autoComplete: false,
      },
      async (jobs, { complete, fail }) => {
        const results = await Promise.allSettled(
          jobs.map((job) => this.runJob(job, queueName, jobHandler)),
        );

        const resultsWithJobId = results
          .map((result, index) => ({ result, jobId: jobs[index].id }))

        for (const {result, jobId} of resultsWithJobId) {
          if (result.status === "rejected") {
            await fail(jobId, result.reason)
          } else {
            await complete(jobId)
          }
        }
      },
    );

work would only poll again if all jobs have been completed or failed

timgit commented 2 months ago

The issue with Promise.allSettled and Promise.all is that it has no concurrency control. v9 and below had support for this via the p-map package and used the teamConcurrency option. However this produced too much complexity for all use cases, since some jobs within a batch may take longer than others, reducing the overall throughput over time.

danilofuchs commented 2 months ago

is that it has no concurrency control

What do you mean by concurrency control in the context of Promise.allSettled? Not sure I understood the requirement

timgit commented 2 months ago

How many promises run in parallel

foteiniskiada commented 1 week ago

We've run into a similar issue, since we have a production system that needs to be upgraded to v10, and it seems it is no longer possible to partially fail a batch of jobs; if one fails all the batch gets failed, which is problematic since it degrades performance. Is there a way to handle these cases by allowing success of non-erroneous jobs?

klesgidis commented 1 week ago

In addition, we saw in the source code that you either succeed all jobs or none. The greatest danger here is that if one job from the batch fails, all other jobs will also fail. In that case even the successful jobs will be retried, which means that if a job is not idempotent bigger problems will arise.

klesgidis commented 1 week ago

@timgit tagging you, because I am not sure if notifications reaches you in a closed issue