thomasdondorf / puppeteer-cluster

Puppeteer Pool, run a cluster of instances in parallel
MIT License
3.2k stars 307 forks source link

(Question) Tasks separated into modules for each website #540

Closed 4e576rt8uh9ij9okp closed 2 months ago

4e576rt8uh9ij9okp commented 4 months ago

Hi there, I'm new to web-crawling and I would like to use puppeteer-cluster and separate tasks into it's own nodejs modules to keep the tasks separate from each other.

/index.js
/tasks/google.js
/tasks/youtube.js

index.js

const { Cluster } = require('puppeteer-cluster');
const googleTask = require('./tasks/google.js')
const youtubeTask = require('./tasks/youtube.js')

(async () => {
    // Create a cluster with 2 workers
    const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_CONTEXT,
        maxConcurrency: 2,
    });

    // Define a task (in this case: screenshot of page)
    await cluster.task(googleTask);
    await cluster.task(youtubeTask);

    // Shutdown after everything is done
    await cluster.idle();
    await cluster.close();
})();

./tasks/google.js

module.exports = async () => {
        await page.goto("https://google.com");
        // do something
    }