puppeteer / puppeteer

Node.js API for Chrome
https://pptr.dev
Apache License 2.0
87.64k stars 9.03k forks source link

Error: Protocol error (Network.getResponseBody): Request content was evicted from inspector cache from getting response body from large XML page (around 13mb) #6647

Open OwaisSiddiqui opened 3 years ago

OwaisSiddiqui commented 3 years ago

I am trying to get XML from a large XML page (around 13 mb) using Puppeteer. When I make the request and try to get the response body it returns the error: Error: Protocol error (Network.getResponseBody): Request content was evicted from inspector cache. I also tried the solution from this issue: https://github.com/puppeteer/puppeteer/issues/1599#issuecomment-355473214 but it did not work.

Steps to reproduce

Tell us about your environment:

What steps will reproduce the problem?

const browser =  await puppeteer.launch({ args: ["--proxy-server='direct://'", '--proxy-bypass-list=*'], headless: false })
const page = await browser.newPage()
await page.setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36")
const client = await page.target().createCDPSession()
await client.send('Network.enable', {
            maxResourceBufferSize: 1024 * 1204 * 100,
            maxTotalBufferSize: 1024 * 1204 * 200,
            maxPostDataSize: 1024 * 1204 * 200
})
var response = await page.goto("view-source:https://rentals.ca/sitemaps/sitemap-on-toronto.xml", {waitUntil: 'networkidle2'})
var pageXML = ""
if (response !== null) {
    var pageXML = await response.text()
    await page.close()
    await browser.close()
}

What is the expected result?

The expected result is the response XML.

What happens instead?

It throws the error: Error: Protocol error (Network.getResponseBody): Request content was evicted from inspector cache.

hxrain commented 3 years ago

https://github.com/puppeteer/puppeteer/issues/1599

My problem is solved.

OwaisSiddiqui commented 3 years ago

I have already tried that @hxrain, it did not work, still get the same error.

dnm13 commented 2 years ago

+1 having the same issue here also, from #1599 where the accepted solution (probably) is

await page._client.send('Network.enable', {
      maxResourceBufferSize: 1024 * 1204 * 100,
      maxTotalBufferSize: 1024 * 1204 * 200,
    })

I can't get it to work in TypeScript as _client is a private property For me, this error was thrown when I tried to download image:

var view = await page.goto(url, { timeout: 0, waitUntil: 'networkidle2' });
await fs.writeFileSync(path, await view.buffer())

It's not all the time, but I can't grasp how it got triggered

EDIT: tried what's looks like the same solution above from Owais' snipcode, and yes, I still got the same error

pita238 commented 2 years ago

Hi, I have the same problem with downloading images. Error: Protocol error (Network.getResponseBody): Request content was evicted from inspector cache

I noticed that problem is because there is no "Content-Length" in the Headers for this image and it is undefined. For all others there is "Content-Length" and code is working. How to bypass this? I see transferred 548kb but I cant save it because buffer is undefined.

Page 1 - buffer is indefined Page 2 - buffer is OK

const res = await page.goto(https://www.novinarnica.plus/reader/api/get-jpg/86079?ID=86079&page=1&size=mid, { waitUntil: "networkidle0", timeout: 0 }); try { const buffer= await res.buffer(); console.log(- done: ${buffer.length} bytes); await fs.writeFileSync(fullpath, buffer); } catch (e) { console.error(- failed: ${e}); }

stale[bot] commented 2 years ago

We're marking this issue as unconfirmed because it has not had recent activity and we weren't able to confirm it yet. It will be closed if no further activity occurs within the next 30 days.

ArtHoff commented 2 years ago

Hello, I'm receiving this error too when downloading large web pages. To get around this I used the fix above and it worked just fine.

await page._client.send('Network.enable', {
      maxResourceBufferSize: 1024 * 1204 * 100,
      maxTotalBufferSize: 1024 * 1204 * 200,
    })

However with more recent versions of Puppeteer, this is no longer working and I receive the following error:

page._client.send is not a function

Since page._client.send is no longer available I am wondering whether something has replaced it which I should be using instead?

Thank you for your help with this.

Orlandster commented 2 years ago

@ArtHoff The property _client seems not to exist anymore. Instead get it from createCDPSession.

const client = await page.target().createCDPSession();

await client.send('Network.enable', {
    maxResourceBufferSize: 1024 * 1204 * 100,
    maxTotalBufferSize: 1024 * 1204 * 200,
});
ArtHoff commented 2 years ago

@Orlandster Thank you for your suggestion; however, unfortunately when using that change my original problem comes back, it appears that the maxResourceBufferSize and maxTotalBufferSize are ignored now:

ProtocolError: Protocol error (Network.getResponseBody): Request content was evicted from inspector cache
    at /myProg/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:278:24

Is there anything else I need to do? This is with puppeteer 15.3.2

howtomakeaturn commented 1 year ago

+1

  const client = await page.target().createCDPSession();

  await client.send('Network.enable', {
    maxResourceBufferSize: 1024 * 1204 * 100,
    maxTotalBufferSize: 1024 * 1204 * 200,
  });

this seems to not work anymore

any solution/workaround for now?

thanks...

vitorlans commented 1 year ago

+1

Hi Guys,

Any updated on that or workaround for now? I was updating to latest puppeteer versions, but I started face same issue.

eskimo220 commented 1 year ago

+1

eskimo220 commented 1 year ago

I have conducted extensive research and finally found a solution. The method is very simple.

_client ---> _client()

await page._client().send("Network.enable", {
  maxResourceBufferSize: 1024 * 1204 * 50,
  maxTotalBufferSize: 1024 * 1204 * 100,
});

The only change you need to make is to add a pair of parentheses after _client. Since version 13, _client in Page has become a function. That's all it takes, and I have verified it successfully in the latest version.

Some individuals previously suggested using page.target().createCDPSession(), which led to significant confusion. Although it doesn't throw an error, it doesn't work as intended.

intrigus-lgtm commented 1 year ago

@eskimo220 you are a legend! This works :tada:

DamonAmber commented 8 months ago

I have conducted extensive research and finally found a solution. The method is very simple.

_client ---> _client()

await page._client().send("Network.enable", {
  maxResourceBufferSize: 1024 * 1204 * 50,
  maxTotalBufferSize: 1024 * 1204 * 100,
});

The only change you need to make is to add a pair of parentheses after _client. Since version 13, _client in Page has become a function. That's all it takes, and I have verified it successfully in the latest version.

Some individuals previously suggested using page.target().createCDPSession(), which led to significant confusion. Although it doesn't throw an error, it doesn't work as intended.

it works! Amazing!