ulixee / secret-agent

The web scraper that's nearly impossible to block - now called @ulixee/hero
https://secretagent.dev
MIT License
674 stars 45 forks source link

DNS configuration #100

Closed xTRiM closed 3 years ago

xTRiM commented 3 years ago

Is there any way to enforce using local DNS resolver? In one of the projects, I must use a local DNS resolver (installed on the same host), and it seems that SecretAgent uses pre-configured DNS servers like Cloudflare over TLS, and I can't find a way to change that.

blakebyrnes commented 3 years ago

You should be able to modify the Browser Emulator that you're using. So if you're using Chrome83 by default, you would load up the package and set the DnsResolver to null (or a different Dns over TLS provider if you want to).

` import Chrome83 from '@secret-agent/emulate-chrome-83'; Chrome83.dnsOverTlsConnectOptions = null;

... secret agent initialization, etc. `

If you can share why you need the local resolver, I'm curious!! ;) Are you scraping internal sites??

xTRiM commented 3 years ago

@blakebyrnes thank you, you are correct - internal sites (something like RPA, and as that is a "serious corp" they gave an internal secured server to install our software and also their internal proxy server through which our scraper should go and looks like it has no idea how to resolve hosts).

After setting this:

Chrome83.dnsOverTlsConnectOptions = null;

it resolves locally only if there's no upstreamProxyUrl.

But if upstreamProxyUrl is set - then it looks like it tries to resolve through proxy ;) And even if I'm explicitly setting local dns nothing changes (it still doesn't use it if there's proxy set):

const DnsLocalhost = {
    host: '127.0.0.1',
    servername: 'localhost',
};
Chrome83.dnsOverTlsConnectOptions = DnsLocalhost;

Do you think there's any way to make it use local or designated dns (e.g. DnsLocalhost above but also locally), without passing that job to (or through) proxy? Even tried hosts file - but it also doesn't work if upstreamProxyUrl is set.

blakebyrnes commented 3 years ago

Hmm. That's kind of strange. The Dns lookup should be getting handed off to the Sockets, which are Go processes. Poking around a bit, it sounds like OSX (not sure what OS you're on) might have some different dns resolvers by default (https://golang.org/pkg/net/#hdr-Name_Resolution). You might have luck setting the environment variables they're talking about here... but unfortunately, you're out in the wild a bit with this one!

FWIW, the DnsLocalhost example won't work because the DnsOverTls provider is trying to speak dns over tls with that connection. That's not the normal state of dns, but it's what Chrome and Firefox have started using (for reasons of privacy, and maybe possibly to get ad networks to not get blocked at a router level???.. ha)

xTRiM commented 3 years ago

@blakebyrnes got it, thank you, I'll figure something out with this case.

mrmiroslav commented 3 years ago

Does DNS config Chrome88.dnsOverTlsConnectOptions = null; still work? Or I do not have ability to write node code? It just gives me undefined error.

I am struggling to get it work with BrightData proxies - grasping straws here. I think there might be an issue with MITM - it errors on creating a tunnel:

with socket:

'Error: reading CONNECT response failed (unexpected EOF)'
...
Error: reading CONNECT response failed (unexpected EOF)

or with http:

  clientError: 'Error: connection refused (502)\n' +
    '<!doctype html><h1>Webpage not available</h1><p>The webpage could not be loaded because:</p><p>Block direct route</p>',

sample code:

const {Agent} = require('secret-agent');

(async () => {
    let agent = await new Agent({
        upstreamProxyUrl: 'socks5h://**********************-country-us-route_err-block-session-1:**********@zproxy.lum-superproxy.io:22228',
        // upstreamProxyUrl: 'http://**********************-country-us-route_err-block-session-1:**********@zproxy.lum-superproxy.io:22225',
    });

    await agent.goto("https://lumtest.com/myip.json");

    console.log(await (await agent.document).body.innerText);

    await agent.close();
})();