thomasdondorf / puppeteer-cluster

Puppeteer Pool, run a cluster of instances in parallel
MIT License
3.2k stars 308 forks source link

cluster execute not running in parallel #481

Closed maxwill-max closed 2 years ago

maxwill-max commented 2 years ago
/** cluster.execute */
const { Cluster }=require('puppeteer-cluster')
const vanillaPuppeteer = require('puppeteer')
const { addExtra } = require('puppeteer-extra')
const Stealth = require('puppeteer-extra-plugin-stealth')
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker')

async function main() {
    const puppeteer = addExtra(vanillaPuppeteer)
    puppeteer.use(Stealth())
    puppeteer.use(AdblockerPlugin())  

    const cluster = await Cluster.launch({
        puppeteer,
        maxConcurrency: 7,
        concurrency: Cluster.CONCURRENCY_BROWSER,
        puppeteerOptions:{headless:true,},
        monitor:true
    })    

    cluster.on('taskerror', (err, data) => {
        console.log(`Error crawling1 ${data}: ${err.message}`)
    })

    await cluster.task(async ({ page, data: url }) => {        
        await page.goto(url)
        const ip = await page.content()
        return ip
    })

    try {
        for (var i=0; i<100; i++){
        const ip=await cluster.execute('https://httpbin.org/ip')       
        }
    } catch (err) {
        console.log(err)
    }
    await cluster.idle()
    await cluster.close()
}
main().catch(console.warn)

1

max concurrency is to 7 and right now, 5 workers are started but at any time, only one 1 worker is actively running

script moves to next worker only after the current worker is finished.

I expected the script would run 7 workers and wait for any of the workers to finish to start the next so effectively, utilizing all 7 browser contexts I expected the script would move to the next worker only if there is a available spot in concurrency

thomasdondorf commented 2 years ago

You are using cluster.execute which waits for the task to finish. You want to use cluster.queue instead.