schickling / chromeless

🖥 Chrome automation made simple. Runs locally or headless on AWS Lambda.
https://chromeless.netlify.com
MIT License
13.24k stars 575 forks source link

Parallel execution #88

Open diit opened 7 years ago

diit commented 7 years ago

What would be a proof of concept for executing an array of chromeless "agents" in parallel?

ie (currently)

for(var item in parsedList) {
        const page = await chromeless
            .goto(parsedList[item].href)
            .evaluate(() => {
            })
}

Current Theory

Promise.all(arrayOfAgents).then(result => ...)

However, it returns an array of undefined.

td0m commented 7 years ago

I think this is a great idea, currently I am pulling a lot of data in parallel by running multiple terminals executing node scraper.js but I agree a native parallel execution would be useful. +1

notatestuser commented 7 years ago

This seems to work if multiple instances of Chromeless are created. Am I missing something?

LeMoussel commented 7 years ago

Does it work with multiple instances of 'Chromeles' and one instance of Chrome on AWS ?

td0m commented 7 years ago

I don't know about AWS, but it works on my local machine with multiple chromeless instances and one chrome headless running.

notatestuser commented 7 years ago

@LeMoussel yes.

ldenoue commented 7 years ago

So would it work if I use chromeless from a NodeJS server that handles requests from several users? I mean, does chromeless span several chrome processes or uses several tabs? And if tabs, are they closed automatically?

adieuadieu commented 7 years ago

Each instance of new Chromeless will create it's own tab and then close it when you call chromeless.end(). When passing the remote: true option, each instance will connect to it's own Proxy (e.g. one instance of chromeless, one lambda invocation.) In this case, calling chromeless.end() will effectively end the Lambda function invocation.

I wrote some more in https://github.com/graphcool/chromeless/issues/24#issuecomment-319908639 about running multiple Chromeless instances at once.

sul4bh commented 7 years ago

Is it possible to run parallel tests locally?

I have an example based on https://github.com/graphcool/chromeless/issues/24#issuecomment-319908639 which does the following:

I am on Chrome Canary 62.0.3182.0 (64-bit)

adieuadieu commented 7 years ago

@sul4bh make sure each instance has it's own debugger port with --remote-debugging-port=XXXX

Typically Chrome doesn't like to run more than one instance of itself. To run multiple, separate instances of Chrome locally, you might need to run each within a Docker container. From our work-in-progress CircleCI config:

docker run \
  -d \
  --rm \
  --name chrome \
  --shm-size 1024m \
  -p 9222:9222 \
  --cap-add SYS_ADMIN \
  yukinying/chrome-headless-browser
sul4bh commented 7 years ago

@adieuadieu I am trying to make it super easy for product engineers to write integration tests. Not sure if introducing docker based headless chrome would help with that. I want to keep the whole process as simple as possible. But the CI can use docker based headless setup. Thanks for the pointer to your config file.

I was thinking, we can add an option key to ChromelessOptions that will make the Chrome instances to start with a randomized debug port number (https://github.com/graphcool/chromeless/blob/master/src/chrome/local.ts#L40) What do you think about it? I can work on a PR if you think such option can be useful.

ricardovsilva commented 7 years ago

@sul4bh instead of open in a random port, what do you think about allow to pass port into constructor?

tsirolnik commented 7 years ago

@sul4bh I think that you'll need to open an instance on a different port

adieuadieu commented 7 years ago

@sul4bh Hm.. Internally, chrome-launcher will already try to spawn Chrome on a random port, so perhaps the change to Chromeless would be that, if no port is specified, we don't set a default of 9222, then the chrome-launcher getRandomPort() function would be used, thus giving you a random port. We're open to a PR for this.

@ricardovsilva while admittedly not well documented, it's possible to specify a port in the Chromeless constructor:

const chromeless = new Chromeless({ cdp: { port: 1234 } } })

More options in code here.

ricardovsilva commented 7 years ago

@adieuadieu works like a charm, that's allow me to run tests in parallel and specify a port range for each chrome that opens.

ricardovsilva commented 7 years ago

@adieuadieu I created a PR https://github.com/graphcool/chromeless/pull/271 to update docs with port option.

ricardovsilva commented 7 years ago

@adieuadieu about open chrome with a random port, I think that is a bad idea because chromeless doesn't have any control about which ports are available, and it can leads to strange behaviours. In my opinion, allow programmer to make explicity which port he wants to use is a better practice.

sul4bh commented 7 years ago

@adieuadieu Thanks for the pointer to getRandomPort() in chrome-launcher.

I am working on a PR to add the random port feature. Having that will allow us to run the tests in parallel using local Chrome instances.

skunkwerk commented 6 years ago

I can't seem to get this working when running locally.

I just get 10 Chrome icons pop up, but the pages never load:

const Chromeless = require('chromeless').default

var list = ['http://www.google.com','http://www.google.com','http://www.google.com','http://www.google.com','http://www.google.com','http://www.google.com','http://www.google.com','http://www.google.com','http://www.google.com'] var requests = []

async function run(url) { var chromeless = new Chromeless({})

await chromeless.goto(url)

}

async function test() { for(let item of list) { requests.push(run(item)) } await Promise.all(requests) }

test().catch(console.error.bind(console))

Any ideas?

mfrye commented 6 years ago

@adieuadieu I got that working locally - thanks for sharing.

Having trouble running more than once though. I start up the container, run my process, it executes fine, then I disconnect. When trying to run an additional process against the container, it seems that it just stalls.

Looking at the container, it looks like it is still running the previous site from the first call.

Have you run into this?

eddiezane commented 6 years ago

I wound up accomplishing this with the following:

const { Chromeless } = require('chromeless')

function sleep() {
  return new Promise(resolve => {
    setTimeout(resolve, 2000)
  })
}

const sites = [
  'https://google.com',
  'https://apple.com',
  'https://reddit.com',
  'https://twitter.com',
  'https://facebook.com'
]

async function run() {
  try {
    const masterChromeless = new Chromeless()

    await sleep() // Needed to wait for Chrome to start up

    const promises = sites.map(site => {
      return new Promise((resolve, reject) => {
        const chromeless = new Chromeless({ launchChrome: false })
        chromeless
          .goto(site)
          .screenshot()
          .then(async screenshot => {
            await chromeless.end()
            resolve(screenshot)
          })
          .catch(err => reject(err))
      })
    })

    const screenshots = await Promise.all(promises)

    screenshots.forEach(screenshot => console.log(screenshot))

    await masterChromeless.end()
  } catch (err) {
    console.error(err)
  }
}

run()

This will spawn one Chrome instance and then launch a tab for each site.

Ideally sleep would be replaced with something like await masterChromeless.start().

papayaah commented 6 years ago

@eddiezane this isn't exactly parallel right? It does open up multiple browser tabs but the sequence of the actions are round robin. It doesn't take screenshots all at the same time, and i can see each browser tab closes one at a time.

I'm looking for something that exactly does everything at the same time so i can simulate 10's of simultaneous user activities at a site.

mattgolding commented 6 years ago

@mickeyren trying to do the same as you "run 1000's of tests at the same time" via lambda. I want these to be different instances of chrome (not different tabs). Wouldn't the above example work? The local code would be waiting for the lambda to respond?

I find it a little wierd that their isn't an example of this when the headline feature of chromeless is this parallel running of tests.