josdejong / workerpool

Offload tasks to a pool of workers on node.js and in the browser
Apache License 2.0
2.04k stars 148 forks source link

Guidance on Restarting a Specific Service in Workerpool to Handle Memory Leaks in Playwright #427

Closed wojtekKrol closed 7 months ago

wojtekKrol commented 7 months ago

Description

I am using the workerpool library to manage multiple services in a Node.js application, specifically for crawling tasks using Playwright. However, I've encountered an issue with Playwright related to memory leaks. This seems to be a common problem among developers using Playwright, and the suggested workaround involves restarting the Playwright process to free up memory.

Issue

In my application, each service is a separate worker within workerpool. One of these services, a crawler, is responsible for handling thousands of URLs. Due to the memory leak in Playwright, I need a way to programmatically restart this specific service (crawler) within workerpool. The service is stateless and does not process any data persistently, so it should be feasible to restart it without losing important information.

Current Implementation

Here is a simplified version of how the services are structured:

// Main file
import path from 'path';
import { fileURLToPath } from 'url';
import { pool } from 'workerpool';
import { runApiServer } from '~/api/api.js';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const computeWorkersCount = (
  name: AppWorker
): [min: number, max: number] => {
  return [
    Number(CONFIG[(name + '_WORKERS_MIN') as AppWorkersMin]),
    Number(CONFIG[(name + '_WORKERS_MAX') as AppWorkersMax]),
  ]
}

const main = async () => {
  // Database initializations
  const oneDB = createOneDB();
  const anotherDB = createAnotherDB();

  runApiServer({ oneDB , anotherDB });

  const services = [
    [computeWorkersCount('FIRST'), './services/one'],
    [computeWorkersCount('SECOND'), './services/second'],
    [computeWorkersCount('THIRD_LEAK_MEMORY_PROBLEM'), './services/third'],
  ];

  for (const [[min, max], servicePath] of services) {
    pool(path.join(__dirname, servicePath), {
      minWorkers: min,
      maxWorkers: max,
    })
      .exec('main', null)
      .catch(console.error);
  }
};

main();

// Example of a service worker
import { worker } from 'workerpool';

worker({
  main: () =>
    main({
      oneDB: createOneDB(),
      anotherDB: createAnotherDB(),
    }),
});

Request

I am seeking guidance or a feature within workerpool that would allow me to restart a specific service (especially the crawler service using Playwright) to handle the memory leak issue. This would involve terminating and then reinitializing the service's process. Any suggestions or solutions for this scenario would be greatly appreciated.

josdejong commented 7 months ago

I guess you can call .terminate() on the workerpool to kill all workers, and then create a new workerpool.

wojtekKrol commented 7 months ago

@josdejong I would like to make that logic, that worker process logic inside it (or best repeat it N times), and after that it will be terminated and re-created (with reset N counter) automatically.

josdejong commented 7 months ago

I think what you can do is create a little wrapper function around your workerpool that:

There is no support for terminating a single worker, but this would terminate all of them and re-create them once in a while to solve the memory leaks issue.