Closed nikitha12 closed 3 years ago
Currently, for each URL I launch an instance of playwright and close it when I am done.
Why closing playwright instance? You can just close the context and create new one and a new page in it which is way faster, this is the recommended way of isolating between pages and maintaining good performance.
Can we have a Playwright cluster similar to the puppeteer cluster?
It should be fairly easy to replace Puppeteer with Playwright in puppeteer cluster implementation and have similar implementation. I don't know much about puppeteer cluster but from a brief look it seems to me that for this particular task it might be easier to control multiple pages in the same context/multiple context in the same browser concurrently from nodejs playwright due to the synchronous nature of Java API.
Scraping use cases are not a priority at the moment and we don't have plans working in this direction in the java client, so I'm closing this request.
Feature request
I want to process a large number of requests in order of 100 000 pages quickly. To achieve that I might have to process 500-600 requests in parallel
Currently, for each URL I launch an instance of playwright and close it when I am done. But based on the resources available on a machine/ pod the number of playwright instances that can be launched is limited. To scale, I either have to increase resources or the number of pods. I am looking for a better way to parallelize request processing on playwright
Can we have a Playwright cluster similar to the puppeteer cluster? There will be a pool of playwright instances. To process a request we would take proxy and config to describe things to do after page navigation. The cluster can handle errors and retries.