thomasdondorf / puppeteer-cluster

Puppeteer Pool, run a cluster of instances in parallel
MIT License
3.24k stars 310 forks source link

Queued URLs Not Executing loadPage Method After cluster.queue is Called #547

Open kewang opened 3 weeks ago

kewang commented 3 weeks ago

Hi @thomasdondorf,

Below is my code. I am currently using puppeteer-cluster to implement a prerendering feature, but I often encounter an issue where a URL has been passed in through ExpressJS, and cluster.queue is executed, but the loadPage method is not triggered for a long time. On average, the render time is around 3000ms, and there are about 5-10 requests per minute. However, there are always some URLs that are already queued with cluster.queue but remain unprocessed, even though the cluster is in an idle state.

I was originally using cluster.execute to handle the requests, but after reading #481, I switched to using cluster.queue, which seems to be the correct approach. Unfortunately, the issue still persists, and I am unsure how to resolve it.

const browserCluster = await launchCluster();

const loadPage = async ({ page, data: url }) => {
  console.log(`rendered url: ${url}`);

  let response;

  try {
    response = await page.goto(url, {
      waitUntil: "networkidle2",
    });

    if (!response) {
      throw new Error("response is null");
    }
  } catch (error) {
    console.error(`[PUPPETEER-CLUSTER] ${url} ${error}`);

    return res.sendStatus(500);
  }

  const content = await page.content();

  console.log(`[PUPPETEER-CLUSTER] Retrieve ${url}`);

  return res.status(response.status()).send(content);
};

router.get("/render", async (req, res) => {
  const url = req.query.url;

  console.log(`queue url: ${url}`);

  browserCluster.queue(url, loadPage);
});