zytedata / zyte-smartproxy-headless-proxy

A complimentary proxy to help to use SPM with headless browsers
MIT License
110 stars 36 forks source link

Had an easier time using the old scraping hub docker for smart proxy-headless-proxy #72

Open vonkoff opened 2 years ago

vonkoff commented 2 years ago

https://hub.docker.com/r/scrapinghub/crawlera-headless-proxy

Talking about the docker image above. Talked to a zyte rep to tell them that docker run $IMAGE_NAME -a $APIKEY did not work from the instructions I followed with this repo.

I tried running the sample script that was given:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        ignoreHTTPSErrors: true,
        headless: false,
        args: [
            '--proxy-server=localhost:3128'
        ]
    });
    const page = await browser.newPage({ignoreHTTPSErrors: true});

    console.log('Opening page ...');
    try {
        await page.goto('https://toscrape.com/', {timeout: 180000});
    } catch(err) {
        console.log(err);
    }

    console.log('Taking a screenshot ...');
    await page.screenshot({path: 'screenshot.png'});
    await browser.close();
})();

and got the following error in the console: Error: net::ERR_PROXY_CONNECTION_FAILED at https://toscrape.com/

The zyte chad support rep straight up gave me this to run docker run --name crawlera-headless-proxy -p 3128:3128 scrapinghub/crawlera-headless-proxy -d -u proxy.crawlera.com -o 8011 -a $APIKEY --direct-access-hostpath-regexps="(.pagead2.googlesyndication.com.$|.accounts.google.com.$|.dl.google.com.$|.clients2.google.com.$|.*?\.(?:txt|css|eot|svg|gif|ico|jpe?g|js|less|mkv|min|mp4|mpe?g|png|ttf|webm|webp|woff2?)$)" -x profile=desktop -x cookies=disable -x timeout=180000

AND IT WORKED. Able to use the proxy with puppeteer headless browser. If you're reading this hope it helps :)