zfcsoftware / puppeteer-real-browser

This package is designed to bypass puppeteer's bot-detecting captchas such as Cloudflare. It acts like a real browser and can be managed with puppeteer.
https://www.npmjs.com/package/puppeteer-real-browser
MIT License
359 stars 45 forks source link

Running Puppeteer in Headless Mode for Captcha Solving #49

Closed alpgul closed 1 month ago

alpgul commented 1 month ago

When I activate the headless mode, Puppeteer can't solve the capctra. Is there a way to run it in headless mode?

Additionally, the following solution is more accurate. Since we disable Puppeteer's access to Cloudflare, there won't be iframe access, so it would be more appropriate to manipulate the response and communicate with the iframe for a more accurate solution.

const {
  RequestInterceptionManager,
} = require("puppeteer-intercept-and-modify-requests");
const puppeteer = require("puppeteer-core");
const script = `<script>const targetSelector = 'input[type="checkbox"]';
const observer = new MutationObserver((mutationsList) => {
  for (const mutation of mutationsList) {
    if (mutation.type === 'childList') {
      const addedNodes = Array.from(mutation.addedNodes);
      for (const addedNode of addedNodes) {
        const node = addedNode.querySelector(targetSelector);
        if (node) {
          setTimeout(()=>{node.parentElement.click();},1000);
        }
      }
    }
  }
});

const targetElement = document.documentElement;
const observerOptions = {
  childList: true,
  subtree: true,
};
observer.observe(targetElement, observerOptions);</script>`;
function targetFilter(target) {
  if (target._getTargetInfo().type !== "iframe") {
    return true;
  }
  return false;
}
const main = async () => {
  const browser = await puppeteer.launch({
    executablePath:
      "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",

    targetFilter,
    headless: false,
  });
  const page = await browser.newPage();

  const client = await page.target().createCDPSession();
  const interceptManager = new RequestInterceptionManager(client);
  await interceptManager.intercept({
    urlPattern: `https://challenges.cloudflare.com/*`,
    resourceType: "Document",
    modifyResponse({ body }) {
      return {
               body: body.replaceAll("<head>", "<head>" + script),
      };
    },
  });
  console.log("Connected to browser");
  await page.goto("https://nopecha.com/demo/cloudflare", {
    waitUntil: "domcontentloaded",
  });
  try {
    await page.waitForSelector(".link_row", {
      timeout: 100000,
    });
  } catch (error) {
    console.error(error);
  }
  await page.screenshot({ path: "example.png" });
  await browser.close();
};
main();
zfcsoftware commented 1 month ago

Hi, this system works in simple cloudflare encounters but not in a situation like this. It might be because the captcha is clicked with javascript and the targetFilter function is not enough. Since the purpose of the library is not only cloudflare (e.g. google login), we use it this way. You can use wsl and docker in Windows environment to start the browser incognito.

https://github.com/zfcsoftware/puppeteer-real-browser/assets/123484092/1f4c19b8-3566-4058-8347-5d25280ab70a

alpgul commented 1 month ago

I have found the reason why headless mode is not working, and it is because the user-agent within the iframes does not change. This is because the program skips the iframes. When I changed the user agents through remote debugging using DevTools, the headless mode started working. The only problem is changing the user agents within iframes without using Puppeteer

alpgul commented 1 month ago

https://hmaker.github.io/selenium-detector/ You can test whether it is captured headless with this testing tool. https://github.com/kaliiiiiiiiii/Selenium-Driverless/discussions/86 This link also explains how it was detected.

alpgul commented 1 month ago
const {
  RequestInterceptionManager,
} = require("puppeteer-intercept-and-modify-requests");
const puppeteer = require("puppeteer-core");
function targetFilter(target) {
  const session = target._session();
  if (session) {
    session.send = new Proxy(session.send, {
      apply(target, thisArg, args) {
        if ("Runtime.enable" === args[0]) {
          return Promise.resolve();
        } else {
          const result = Reflect.apply(target, thisArg, args);
          return result;
        }
      },
    });
  }
  return true;
}
const script = `<script>
Element.prototype._addEventListener = Element.prototype.addEventListener;
Element.prototype.addEventListener = function () {
    let args = [...arguments]
    let temp = args[1];
    args[1] = function () {
        let args2 = [...arguments];
        args2[0] = Object.assign({}, args2[0])
        args2[0].isTrusted = true;
        return temp(...args2);
    }
    return this._addEventListener(...args);
}
const targetSelector = 'input[type=checkbox]';
const observer = new MutationObserver((mutationsList) => {
  for (const mutation of mutationsList) {
    if (mutation.type === 'childList') {
      const addedNodes = Array.from(mutation.addedNodes);
      for (const addedNode of addedNodes) {
        if (addedNode.nodeType === addedNode.ELEMENT_NODE) {
        const node = addedNode?.querySelector(targetSelector);
        if (node) {          
          setTimeout(()=>{
            node.parentElement.click();
          },1000);
        }
        }
      }
    }
  }
});

const targetElement = document.documentElement;
const observerOptions = {
  childList: true,
  subtree: true
};
observer.observe(targetElement, observerOptions);
//document.querySelector('script').remove();
</script>`;
async function main() {
  const browser = await puppeteer.launch({
    ignoreDefaultArgs: ["--enable-automation"],
    args: ["--disable-blink-features=AutomationControlled"],
    defaultViewport: null,
    executablePath:
      "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
    headless: "new",
    debuggingPort: 9222,
    targetFilter,
  });
  let page = (await browser.pages())[0];
  await page.setViewport({
    width: 1920,
    height: 1080,
  });
  await page.setUserAgent(
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
  );

  const client = await page.target().createCDPSession();
  const interceptManager = new RequestInterceptionManager(client);
  await interceptManager.intercept({
    urlPattern: `https://challenges.cloudflare.com/*`,
    resourceType: "Document",
    modifyResponse({ body }) {
      return {
        body: body.replaceAll("<head>", "<head>" + script),
      };
    },
  });

  console.log("Connected to browser");
  await page.goto("https://nopecha.com/demo/cloudflare", {
    waitUntil: "domcontentloaded",
  });

  await page.waitForNavigation();
  await page.waitForNavigation();
  await page.waitForNavigation();
  await page.screenshot({ path: "example.png" });
  console.log("Closing browser");
  await browser.close();
}
main();

I wrote the above code to bypass the Runtime.enable for Headless Mode, but this is not an optimal solution; it is a temporary one. Furthermore, it will disable certain features. I added isTrustedto the script to solve captchas, and I also performed code checks. If you try it in different environments, test it with a debugger before using it because if there is an error in the code, it won't work. The previous code didn't work for this reason. And also, when performing remote debugging, the captcha solve won't work because the devtool opens, and this can be detected as a bot because of that.

zfcsoftware commented 1 month ago
const {
  RequestInterceptionManager,
} = require("puppeteer-intercept-and-modify-requests");
const puppeteer = require("puppeteer-core");
function targetFilter(target) {
  const session = target._session();
  if (session) {
    session.send = new Proxy(session.send, {
      apply(target, thisArg, args) {
        if ("Runtime.enable" === args[0]) {
          return Promise.resolve();
        } else {
          const result = Reflect.apply(target, thisArg, args);
          return result;
        }
      },
    });
  }
  return true;
}
const script = `<script>
Element.prototype._addEventListener = Element.prototype.addEventListener;
Element.prototype.addEventListener = function () {
    let args = [...arguments]
    let temp = args[1];
    args[1] = function () {
        let args2 = [...arguments];
        args2[0] = Object.assign({}, args2[0])
        args2[0].isTrusted = true;
        return temp(...args2);
    }
    return this._addEventListener(...args);
}
const targetSelector = 'input[type=checkbox]';
const observer = new MutationObserver((mutationsList) => {
  for (const mutation of mutationsList) {
    if (mutation.type === 'childList') {
      const addedNodes = Array.from(mutation.addedNodes);
      for (const addedNode of addedNodes) {
        if (addedNode.nodeType === addedNode.ELEMENT_NODE) {
        const node = addedNode?.querySelector(targetSelector);
        if (node) {          
          setTimeout(()=>{
            node.parentElement.click();
          },1000);
        }
        }
      }
    }
  }
});

const targetElement = document.documentElement;
const observerOptions = {
  childList: true,
  subtree: true
};
observer.observe(targetElement, observerOptions);
//document.querySelector('script').remove();
</script>`;
async function main() {
  const browser = await puppeteer.launch({
    ignoreDefaultArgs: ["--enable-automation"],
    args: ["--disable-blink-features=AutomationControlled"],
    defaultViewport: null,
    executablePath:
      "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
    headless: "new",
    debuggingPort: 9222,
    targetFilter,
  });
  let page = (await browser.pages())[0];
  await page.setViewport({
    width: 1920,
    height: 1080,
  });
  await page.setUserAgent(
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
  );

  const client = await page.target().createCDPSession();
  const interceptManager = new RequestInterceptionManager(client);
  await interceptManager.intercept({
    urlPattern: `https://challenges.cloudflare.com/*`,
    resourceType: "Document",
    modifyResponse({ body }) {
      return {
        body: body.replaceAll("<head>", "<head>" + script),
      };
    },
  });

  console.log("Connected to browser");
  await page.goto("https://nopecha.com/demo/cloudflare", {
    waitUntil: "domcontentloaded",
  });

  await page.waitForNavigation();
  await page.waitForNavigation();
  await page.waitForNavigation();
  await page.screenshot({ path: "example.png" });
  console.log("Closing browser");
  await browser.close();
}
main();

I wrote the above code to bypass the Runtime.enable for Headless Mode, but this is not an optimal solution; it is a temporary one. Furthermore, it will disable certain features. I added isTrustedto the script to solve captchas, and I also performed code checks. If you try it in different environments, test it with a debugger before using it because if there is an error in the code, it won't work. The previous code didn't work for this reason. And also, when performing remote debugging, the captcha solve won't work because the devtool opens, and this can be detected as a bot because of that.

This is how the error occurs. It can take minutes to pass a captcha. Ekran Görüntüsü - 2024-05-11 02-23-28

zfcsoftware commented 4 weeks ago

https://stackoverflow.com/questions/68289474/selenium-headless-how-to-bypass-cloudflare-detection-using-selenium It seems that when using headless new or true, many window variables are lost and Cloudflare detects them. I searched but couldn't find a method other than headless false.

alpgul commented 4 weeks ago

https://github.com/hehehai/headless-try You can test the method in project link and add it to your own project. You can bypass any kind of restriction using the page.evaluateOnNewDocument( addScriptToEvaluateOnNewDocument)

zfcsoftware commented 4 weeks ago

page.evaluateOnNewDocument(

This is a great project. Thank you for sharing it. Yes evaluateOnNewDocument is a method I use often but I didn't know which values are problematic in window object.