microsoft / playwright

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
https://playwright.dev
Apache License 2.0
63.68k stars 3.45k forks source link

[BUG] Memory increases when same context is used #6319

Closed Romain-P closed 9 months ago

Romain-P commented 3 years ago

Context:

Describe the bug

I'm watching full-js apps (e.g react/angular websites). I initialize one instance, 1 browser and 1 page. I keep the page in cache & retrieving content every 2 seconds.

After 1/2hours, the memory goes crazy. I tried to reload() the page every 30 minutes. It doesn't free the memory. Only way to free the memory is closing the page and recreating a new one.

What could be the source of this memory leak? I suppose reload() frees the javascript-vm so it must be a leak internally to the page

maximveksler commented 1 year ago

I have a case where I make several requests per minute, it leaks memory all over the place. I've tried closing pages, closing contexts, closing browsers, catching all the onClose events and reclosing everything all together, even closing the main playwright instance all the time, nothing works, it still leaks memory and at some point the GC thrashes completely with the CPU setting on fire.

Please provide and/or document a way to fully release all resources so that we can clear it out every now and then. Please don't just focus on testing, it's full of other use cases where playwright is needed to run long long times.

jprofiler_R3kyHEskfF

Which utility was used to plot this graph? I would like to reproduce the experiment.

@rigwild :) ?

gigitalz commented 1 year ago

I have a case where I make several requests per minute, it leaks memory all over the place. I've tried closing pages, closing contexts, closing browsers, catching all the onClose events and reclosing everything all together, even closing the main playwright instance all the time, nothing works, it still leaks memory and at some point the GC thrashes completely with the CPU setting on fire. Please provide and/or document a way to fully release all resources so that we can clear it out every now and then. Please don't just focus on testing, it's full of other use cases where playwright is needed to run long long times. jprofiler_R3kyHEskfF

Which utility was used to plot this graph? I would like to reproduce the experiment.

@rigwild :) ?

JProfile

This issue is still happening.

gigitalz commented 1 year ago

Any solution? Why after a year there is still no fix? This problem hinders playwright's usability. There's nothing more annoying than wasting time integrating a third-party package to find out later that it's riddled with bugs and incompetent developers.

I'll have to ditch this tool very soon if this is not resolved, so annoying.

dgtlmoon commented 1 year ago

@gigitalz

I'll have to ditch this tool very soon if this is not resolved, so annoying.

you can always ask for your money back :dancers:

gigitalz commented 1 year ago

I resolved my issue by doing the following.

  1. Saved the browser's state to a local file (session, local storage, etc) after creating the browser/context and performing the actions required to meet my needs: context.StorageState("state.json")
  2. Close browser, context and kill all node.exe processes every 30 minutes. (this is where the memory leak exists for me), if you don't kill them it creates a separate node.exe process every time. The previous process remains in memory taking up space.
  3. Create new browser/context and load in the saved state.. navigate back to where you need to be. context, err := browser.NewContext( playwright.BrowserNewContextOptions{ StorageStatePath: playwright.String("state.json"), })

While this won't help with infinite scroll or other scenarios it might help some of you. A good example where this would work fine is creating a session with a QR code (for my situation) or after a simple login.

Kind of did the same thing, except I didn't have to save any state, just killed the whole playwright process tree and relooped to create a new spin of the same stuff. Cringe AF.

gigitalz commented 1 year ago

@gigitalz

I'll have to ditch this tool very soon if this is not resolved, so annoying.

you can always ask for your money back 👯

Very funny, I can't because I can't go back in time, moron.

ldexterldesign commented 1 year ago

👋 @gigitalz

Ignore @dgtlmoon - he's very opinionated about open source since he released his "paid hosted service"

The sooner someone with interpersonal skills forks his project the better

Regards

jfp1992 commented 1 year ago

Looks like they're not going to fix this https://github.com/microsoft/playwright/issues/17736

Edit: There have been other threads that have been closed in 2020 https://github.com/microsoft/playwright/issues/4511 https://github.com/microsoft/playwright/issues/4549

At least give us an option to clear the garbage that's collected. I tried gc.collect in python, but this doesn't release it and wouldn't clear what's built up on the node process anyway

Gin-Quin commented 1 year ago

I also encountered this memory that caused the server to crash every hour. I had no choice but to switch to Puppeteer, and not only the memory was stable, but the page loading was also faster.

Switching to Puppeteer was the best workaround for me. I used the following Dockerfile:

FROM node:19.6.0-alpine

# Installs latest Chromium package.
RUN apk add --no-cache \
      chromium \
      nss \
      freetype \
      harfbuzz \
      ca-certificates \
      ttf-freefont \
      dumb-init

# Tell Puppeteer to skip installing Chrome. We'll be using the installed package.
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

# Copy the files necessary for the build
# ... 

# Install and build
RUN npm install
RUN npm run build

# Expose and run the server
EXPOSE 8080
CMD ["node", "build"]
LeMoussel commented 1 year ago

With Node JS v16.17.1 & Playwright v1.26.1, this problem no longer seems to be an issue. I run it about 60 minutes, 2000 pages loaded. Repro:

import { chromium } from 'playwright'

const setup = async () => {
  const browser = await chromium.launch({ headless: false })
  let page = await browser.newPage()

  let j = 0
  while (true) {
    for (let i = 0; i < 20; i++) {
      console.log(j, i)
      await page.goto('https://httpbin.org/delay/1')
    }

    j++;

    console.log('Trying to create a new context, does not fix the leak!')
    await page.close()
    page = await browser.newPage()
  }
}

setup()

image

CrazedCoderNate commented 1 year ago

@gigitalz

I have the same issue. In my use case I am navigating to 100+ URLs every minute. I reach a gig of memory usage in 15 minutes, and running overnight leads to 15gb+ of usage. I've tried closing the page, browser, browser context and Playwright instance. I've even tried nulling the objects for the GC to free up any resources. I have been wrestling with this issue for a few weeks now, and am at the point of considering dumping it. My current solution is to force restart process every time it reaches limit but this is an awful hacky solution. Has anyone found a fix for this issue?

If one of the libraries does not have this leak, I would re-write my program in that language. Currently I am using Java's Playwright.

image

Gin-Quin commented 1 year ago

I deactivated Javascript loading and execution and my memory leak seems to be gone. Or it is still here, but with very low numbers so it's fine. If Javascript is not important, you might want to deactivate it.

Resources on how to do it:

(deactivating Javascript execution) https://stackoverflow.com/questions/65958243/disable-javascript-in-playwright

(blocking loading Javascript) https://scrapingant.com/blog/block-requests-playwright

CrazedCoderNate commented 1 year ago

I think it was user error. I ended up creating a bare bones pool where I created Browsers, Browser Pools, Contexts in a pool and cleared them each time I navigated to a new URL. Using a profiler, this is what my resource usage looks like: image If you have issues with these browsers, I recommend trying to build barebones architecture with multithreading, then porting over to larger use case. Cheers.

LeMoussel commented 1 year ago

@CrazedCoderNate: What tool do you use to see this? Do you have some examples for creating a bare bones pool?

CrazedCoderNate commented 1 year ago

@LeMoussel

Sure thing! I used VisualVM. It is free and it works great for what I needed it for!

I uploaded the project here for this bare bones browser pool.

hyytest commented 1 year ago

Why does using page.close() to close the browser process still exist? This is a very serious memory consumption, is there any solution?

tzbo commented 1 year ago

Any update?

vonkoff commented 1 year ago

I would love a solution as well. I am using const context = await browser.newContext(); and then const page = await context.newPage(); and still getting this. Trying to do infinite scroll for certain pages. Even after finishing that the memory usage just keeps increasing...

jfp1992 commented 1 year ago

I would love a solution as well. I am using const context = await browser.newContext(); and then const page = await context.newPage(); and still getting this. Trying to do infinite scroll for certain pages. Even after finishing that the memory usage just keeps increasing...

I raised an issue for this, it got closed, then a dev responded somewhere for me to remake the ticket. I got lazy as I put time into it.

Anyway, make a loop spamming about:blank and watch the memory usage for python / node (both increase pretty much in tandem). If you do it for a bigger page, like amazon.com it goes up much faster

A solution for this would be to allow us to clear the memory out. I tried garbage clear in python but nothing changes.

If this would ruin the traces or something I'd understand, but we should still be able to manually clear it with a disclaimer on the method or something

vonkoff commented 1 year ago

For anyone doing this in typescript/javascript you can use my code. I have my node app running in a docker container. So, to clear up chromium browsers that are left unopened I just run the code below if I get an error trying to open another browser and after everytime I close a browser in the for loop I have running a list of url's to scrape

import { exec } from 'child_process';

// Function to execute the killall node command
export function killAllNode() {
  exec('killall node', (error, stdout, stderr) => {
    if (error) {
      console.error(`Error executing killall node: ${error.message}`);
      return;
    }
    if (stderr) {
      console.error(`killall node stderr: ${stderr}`);
      return;
    }
    console.log(`killall node stdout: ${stdout}`);
  });
}

You can see the processes you are running with ps -e and can choose other processes to shutdown. Doing this works and keeps the express server where I send a request that would run my scraper up and running.

gigitalz commented 1 year ago

still same issue

vonkoff commented 1 year ago

still same issue

Using the code I use in my project?

tzbo commented 12 months ago

I think it's a stupid design in playwright. Any programmer will hate the memory problem and that is not in his control.

LeMoussel commented 12 months ago

@vonkoff It's not Windows compliant.

On Windows machine, to kill a Node.js server, and you don't have any other Node processes running, you can tell your machine to kill all processes named node.exe. That would look like this:

taskkill /im node.exe

And if the processes still persist, you can force the processes to terminate by adding the /f flag:

taskkill /f /im node.exe

vinismarques commented 12 months ago

Besides not being a proper solution, I would just like to make it clear that restarting everything is just not feasible is some cases, like when you are working on a infinite scroll page. If you restart, you are doomed.

Thought I would reiterate on this just so that no one thinks there is a good way to solve this issue. The bug should be fixed.

dgtlmoon commented 12 months ago

I think it's a stupid design in playwright. Any programmer will hate the memory problem and that is not in his control.

@tzbo tough comment, why dont you make something better?

gigitalz commented 12 months ago

@tzbo tough comment, why dont you make something better?

This is the usual dump f. take people come with when they have zero priority to solve a ticket.

tzbo commented 12 months ago

It's tough and it's a long time ticket. There is no better solution in two years to solve memory problem. Do you think it's a good design? Maybe the original intention is to simplify usage. Of courese it's more simple than puppeteer in some senario. But I think at least it should keep some release memory funtions which mean I call these funtions. I promise I will not use previouse response and ....
context.close? no, it can not release memory. I used it at previous version. Another, I don't want to close it.So I migrate to pupppeter. I wish playwright will be better soon. It supports more browsers.

ldexterldesign commented 12 months ago

👋 @gigitalz,

dgtlmoon isn't well socialised

Ignoring is the only language he speaks

Regards

🥩

lanyuer commented 12 months ago

This is also a very confusing issue for us. As long as we keep opening new pages within the loop, the memory keeps increasing continuously. We have encountered several OOM errors recently.

pavelfeldman commented 12 months ago

We have all the feedback we need for this issue and it is currently pending due to prioritization. I'll disable the comments since they no longer add actionable details to the issue.

pavelfeldman commented 9 months ago

Unbounded heap growth should be mitigated by https://github.com/microsoft/playwright/commit/ffd20f43f8ee1a7a016cd9b29c372e25ec685a62. The heap will still saturate to a certain size (1K handles per object type, ~50-100Mb on average), but will stop growing afterwards.

dgtlmoon commented 9 months ago

Apart from making demands to people I dont know like some previous commenters, I would like to thank Pavel for adding that heap stack test to npm, that's a super cool idea. And generally thank the maintainers for their incredible work here, it's a highly complex project!

gauravkhuraana commented 7 months ago

I face this issue in Azure Dev Ops agent pipeline. Locally all test run fine.. we have 200+ test and 5-6 run in parallel at a time.. after 8 minutes of running build it fails with

<--- Last few GCs ---> [12856:000002C6125D7660] 111 ms: Scavenge 7.9 (9.1) -> 7.6 (9.8) MB, 0.9 / 0.0 ms (average mu = 1.000, current mu = 1.000) allocation failure; [12856:000002C6125D7660] 142 ms: Scavenge 8.4 (9.8) -> 8.0 (10.1) MB, 0.8 / 0.0 ms (average mu = 1.000, current mu = 1.000) allocation failure; [12856:000002C6125D7660] 192 ms: Scavenge 8.7 (10.1) -> 8.2 (10.1) MB, 35.7 / 0.0 ms (average mu = 1.000, current mu = 1.000) allocation failure; <--- JS stacktrace ---> FATAL ERROR: NewSpace::Rebalance Allocation failed - JavaScript heap out of memory 1: 00007FF621C4194F 2: 00007FF621BC6026 3: 00007FF621BC7D10 4: 00007FF6226721F4 5: 00007FF62265D582

ludmilanesvitiy commented 7 months ago

After update from 1.29.1 to the latest 1.40.0, started to receive the error: page.goto: The object has been collected to prevent unbounded heap growth. And yes, I know that I'm not using page.close() to close the window after each test, because it's necessary to speedup tests. With closing page after each test, the error about memory usage disappeared, but appearing another problems, which I do not need to solve.

So, for now decided to downgrade to prev version. Any fix to the lib will be appreciated. Let me know if it's done in some new release. Thanks.

PS. I have about 800 scenarios, each of them has about 15 separate steps. In average, on the 300-350th test they become to fail.

mxschmitt commented 7 months ago

We need unfortunately a reproduction case in order to debug issues like that. A small repository would be ideal. I'm going to lock this for now, so others can re-file and we can work on the missing scenarios, thanks!