rebrowser / rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
https://rebrowser.net
285 stars 25 forks source link

Empty HTML document when using the "enableDisable" approach #30

Closed TimRecktenwald closed 1 week ago

TimRecktenwald commented 2 weeks ago

I encountered an issue where the HTML document of a page is empty for some websites when using the "enableDisable" approach. I was able to consistently reproduce the issue for some websites like YouTube with the small code snippet below. I also included a short video showing the problem. It should be noted that most websites work totally fine, but others (including YouTube and a specific registration page of the Japanese website www.cmoa.jp) exhibit this behavior.

When executing the script as-is, it outputs <html><head></head><body></body></html> to the console. When changing the environment variable in the first line to "0" (i.e., disabling the patches), it logs the entire HTML document normally. I tried both the chromeLauncher and Puppeteer's built-in "launch()" method, which does not seem to make a difference.

I want to use the "enableDisable" approach since I have an existing tool to which I want to apply your patches (which I think are awesome!). Using the "alwaysIsolated" approach would involve more changes to circumvent the restrictions, thus, I did not test the behavior in that mode yet.

Node version: 18.17.1 Puppeteer version: 23.3.1

Code to reproduce the issue:

process.env.REBROWSER_PATCHES_RUNTIME_FIX_MODE = "enableDisable"

const puppeteer = require('puppeteer');

(async () => {
    // Dynamic import due to chrome-launcher being ESM
    const chromeLauncher = await import('chrome-launcher');

    const { port } = await chromeLauncher.launch({ chromeFlags: ['--disable-search-engine-choice-screen'] });
    const browser = await puppeteer.connect({ browserURL: `http://localhost:${port}`});
    const page = await browser.newPage();

    await page.goto('https://youtube.com');

    // Wait for an additional 3 seconds to ensure that the page has been fully loaded
    await new Promise(r => setTimeout(r, 3000));

    let htmlBody = await page.evaluate(() => document.documentElement.outerHTML);
    console.log(htmlBody);
})();

https://github.com/user-attachments/assets/2eff088a-f74d-4318-9a3e-4a6a6b2054f1

nwebson commented 1 week ago

Thanks for such a detailed report. I will take a look once I have a bit more time.

nwebson commented 1 week ago

📣 I just released a new version of rebrowser-patches with a completely new fix for Runtime.Enable leak. This fix doesn't lose access to main context and also works in workers 🎉 It's a complete drop-in replacement, so you don't need to change any of your existing code. Please try your code with new version.

cvhoang commented 1 week ago

Thanks for this. Will there be a version of Playwright as well?

nwebson commented 1 week ago

@cvhoang yes, probably next week

TimRecktenwald commented 1 week ago

@nwebson The "addBinding" approach solved the issue, thank you very much! :) Maybe you should consider adding a note to the "enableDisable" approach section in the README regarding this bug, though.