Had an easier time using the old scraping hub docker for smart proxy-headless-proxy

https://hub.docker.com/r/scrapinghub/crawlera-headless-proxy

Talking about the docker image above. Talked to a zyte rep to tell them that docker run $IMAGE_NAME -a $APIKEY did not work from the instructions I followed with this repo.

I tried running the sample script that was given:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        ignoreHTTPSErrors: true,
        headless: false,
        args: [
            '--proxy-server=localhost:3128'
        ]
    });
    const page = await browser.newPage({ignoreHTTPSErrors: true});

    console.log('Opening page ...');
    try {
        await page.goto('https://toscrape.com/', {timeout: 180000});
    } catch(err) {
        console.log(err);
    }

    console.log('Taking a screenshot ...');
    await page.screenshot({path: 'screenshot.png'});
    await browser.close();
})();

and got the following error in the console: Error: net::ERR_PROXY_CONNECTION_FAILED at https://toscrape.com/

The zyte chad support rep straight up gave me this to run docker run --name crawlera-headless-proxy -p 3128:3128 scrapinghub/crawlera-headless-proxy -d -u proxy.crawlera.com -o 8011 -a $APIKEY --direct-access-hostpath-regexps="(.pagead2.googlesyndication.com.$|.accounts.google.com.$|.dl.google.com.$|.clients2.google.com.$|.*?\.(?:txt|css|eot|svg|gif|ico|jpe?g|js|less|mkv|min|mp4|mpe?g|png|ttf|webm|webp|woff2?)$)" -x profile=desktop -x cookies=disable -x timeout=180000

AND IT WORKED. Able to use the proxy with puppeteer headless browser. If you're reading this hope it helps :)

zytedata / zyte-smartproxy-headless-proxy

Had an easier time using the old scraping hub docker for smart proxy-headless-proxy #72