microsoft / playwright

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
https://playwright.dev
Apache License 2.0
66.73k stars 3.66k forks source link

[BUG] Memory increases when same context is used #6319

Closed Romain-P closed 1 year ago

Romain-P commented 3 years ago

Context:

Describe the bug

I'm watching full-js apps (e.g react/angular websites). I initialize one instance, 1 browser and 1 page. I keep the page in cache & retrieving content every 2 seconds.

After 1/2hours, the memory goes crazy. I tried to reload() the page every 30 minutes. It doesn't free the memory. Only way to free the memory is closing the page and recreating a new one.

What could be the source of this memory leak? I suppose reload() frees the javascript-vm so it must be a leak internally to the page

mxschmitt commented 3 years ago

Currently the objects for e.g. request/response/route get only flushed when a new context is created. For testing you typically create a new context for each test. I folded a few issues into this one of users who also faced into that, so we can better keep track of it and might find a workaround for it in the future.

Romain-P commented 3 years ago

I am recreating the page every 1 hour this is the only work around I have found to keep the memory stable. Why would recreating the context actually helps? I mean doing page.close() and the re-allocating a page using broswer.newPage() is fixing the issue

mxschmitt commented 3 years ago

browser.newPage does internally also create a new context for you and close it once you close the page, so its basically a helper wrapper to simplify its usage.

VikramTiwari commented 3 years ago

Hey @mxschmitt Are there some other similar quirks that can be used to lower memory/CPU usage? I am starting to look into performance and parallelization of our tests and things like this would be helpful. Thanks!

mxschmitt commented 3 years ago

~~The Playwright test runner gets released soon, stay tuned! It handels all that for you. Headless requires mostly less cpu/memory than headed.~~

See here: https://playwright.dev/docs/intro

LDubya commented 3 years ago

Currently the objects for e.g. request/response/route get only flushed when a new context is created

I think being able to have the page flush responses when new ones are received would be a useful feature, and along the lines of the way a typical browser handles responses. Maybe have it as a flag we can set?

While headless browsers have their origins in browser testing, and browser testing continues to be a major use case, data science is a rapidly growing field, and scraping is a major selling point for using headless browsers.

When scraping, you may not want to create an entirely new context with each data retrieval, as the cookies may be important for storing complex states or tokens. This leads to difficulties with the current Playwright implementation. Scrapers are primarily needed when lightweight APIs aren't available or practical, and unfortunately in the modern web heavy pages of multiple megabytes of data can be included in each response.

If you're getting HTTP responses every two seconds, you can easily amass over 1GB of additional memory leakage within an hour if all of those responses are being stored. In my case, after 43 minutes my playwright node process crashed at 1.9GB of total memory size, with the error FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory. Increasing the memory limit will only delay the issue. Being able to prevent a context from cacheing all responses would be ideal.

gigitalz commented 3 years ago

I have a case where I make several requests per minute, it leaks memory all over the place. I've tried closing pages, closing contexts, closing browsers, catching all the onClose events and reclosing everything all together, even closing the main playwright instance all the time, nothing works, it still leaks memory and at some point the GC thrashes completely with the CPU setting on fire.

Please provide and/or document a way to fully release all resources so that we can clear it out every now and then. Please don't just focus on testing, it's full of other use cases where playwright is needed to run long long times.

jprofiler_R3kyHEskfF

Xzeffort commented 3 years ago

image when I run job for a long time, the memory used too much. image

total Memory 4GB

gigitalz commented 3 years ago

Is there a way to properly dispose the whole thing without terminating the process launching it?

rigwild commented 3 years ago

I have a case where I make several requests per minute, it leaks memory all over the place. I've tried closing pages, closing contexts, closing browsers, catching all the onClose events and reclosing everything all together, even closing the main playwright instance all the time, nothing works, it still leaks memory and at some point the GC thrashes completely with the CPU setting on fire.

Please provide and/or document a way to fully release all resources so that we can clear it out every now and then. Please don't just focus on testing, it's full of other use cases where playwright is needed to run long long times.

jprofiler_R3kyHEskfF

I have the exact same issue. Use case: scrapping/botting.

Here is a ultra-simple repro:

import { chromium } from 'playwright'

const setup = async () => {
  const browser = await chromium.launch({ headless: false })
  let page = await browser.newPage()

  let j = 0
  while (true) {
    for (let i = 0; i < 20; i++, j++) {
      console.log(j, i)
      await page.goto('https://v3.vuejs.org/guide/introduction.html#declarative-rendering')
    }
    console.log('Trying to create a new context, does not fix the leak!')
    await page.close()
    page = await browser.newPage()
  }
}

setup()

Both this and https://github.com/microsoft/playwright/issues/8775 are probably duplicates. It shows that there is an issue with playwright itself and not with chromium or webkit.

For readers having the same issue, as a temporary workaround you might use something like PM2 with the memory limit config. It will restart the process when the limit is reached. This is far from ideal though.

mrmiroslav commented 3 years ago

Hey guys - any ETA on when is this going to be fixed?

LuizFelipeNeves commented 2 years ago

Something that can help if it hasn't already been done is optimizing requests, ignoring unnecessary things like images.

LDubya commented 2 years ago

@LuizFelipeNeves images can be very important, depending on what you are doing. Automated browsers have lots of different use cases. The project that I used playwright for needed to take screenshots of webpages, images and all. The problem that Playwright is having isn't a request issue, but a memory leak issue. Reducing the size of the requests only makes it so that it takes longer for the memory overflow to occur, it doesn't fix the underlying problem. I think we can all agree that we should be able to instruct playwright to not make an infinite cache of request response data between page loads, while caching the things that are designed to persist between page loads like cookies and local storage. Basically, a setting to behave more like a browser. So much of the issue we're facing with playwright was solved by the early browser developers dating back to Netscape's cookies in 1994; there's no need to reinvent the wheel here.

LuizFelipeNeves commented 2 years ago

@LDubya you're right, in my case I managed to alleviate the situation by closing the pages, and opening new ones, re using context, instead of reloading and having a memory increase.

liuyazhou1991 commented 2 years ago

The Playwright test runner gets released soon, stay tuned! It handels all that for you. Headless requires mostly less cpu/memory than headed.

@mxschmitt Can you tell us when you expect to solve this problem? I'm going to make a decision. This problem is very important to me.

mxschmitt commented 2 years ago

@liuyazhou1991 it's called Playwright Test see here: https://playwright.dev/docs/intro

simon-liebehenschel commented 2 years ago

Currently the objects for e.g. request/response/route get only flushed when a new context is created.

I've tried to call page.close() and context.close() and open a new context and page on every 100 page.goto(url) calls. I see that your statement that memory is cleaned on a context close is not correct. After ~8 hours of running and about 200,000 network requests (context and page is closed/opened on every 100 requested URLs), Playwright memory usage is about 10 Gigabytes.

In other words, while the bug is not fixed, the workaround is to close not only a browser context but also a whole browser (e.g. await browser.close()) on every N requests, depending on how many free RAM you have. I did not checked whether browser.close() cleans memory, maybe it will be also required to open/close a whole async_playwright context manager to flush memory.

UPDATE

Please note, I use playwright-python, so some terminology may be related only to it.

Sorry, guys my conclusions above were a bit wrong. I want to add some clarity after more investigations:

My website has lots of pages and to perform some kind of end-to-end testing I need to perform lots of action with my website. I've been closing/opening browser periodically withing the same Playwright context manager and after ~1 hour the Playwright used 8 Gb of memory and 31 Gb of swap (in total ~39 Gb of virtual memory).

In other words, I will need regularly restart not only the browser, but close/open the context manager.

I am talking about Playwright-python, so I am not sure that this issue must be exactly (or only) in this repository. The same memory leak is both on the 'latest' and 'dev' Python releases.

itepifanio commented 2 years ago

I'm also with the same problem as @AIGeneratedUsername. I was trying to use playwright to monitor the network of a page and this memory leak shows me that playwright is not a good fit for this use case.

Some Maintainers said that this happens by design on python repo and on java repo. Maybe would be worth it to add this to the documentation since it would save other people time to avoid this use case.

simon-liebehenschel commented 2 years ago

Maybe would be worth it to add this to the documentation since it would save other people time to avoid this use case.

I think it may be not to hard to make persistent logging optional (so a user may opt-out from any debugging) to reduce memory and CPU overhead.

I would like only to add to the documentation a note about how important is to do not forget to close browser pages. I think this may be a common gotcha:

@itepifanio Double-check that you close unnecessary pages right away after a page is not required (e.g., do not keep open thousands of browser pages).

jvi-infare commented 2 years ago

Hi all,

I believe that I had very similar issue which I resolved by modifiying lib\server\browserContext.js and removing context listener from instrumentation map.

I've noticed that when a new context is created, it's being inserted into instrumentation map (lib\server\instrumentation.js) by calling this.instrumentation.addListener(contextDebugger, this); but when context is closed, removeListener is never called (or at least that's happening it my case) so context stays in the map and memory won't get released.

I made some testing for ~12 hours by creating, utilizing and then closing thousands of contexts and they all left hanging in this map even though they were all closed and only "parent" browser was left opened. All the browser tabs were also closed, except for the "parent" one (here I mean chromium instances). NodeJS was consuming gigabytes of RAM and once I removed all the listeners - memory got flushed.

I could try to provide more details if you think that this might be the case.

dipech commented 2 years ago

I'm also experiencing the same problem. I use Java 17 (Temurin 17.0.2), Playwright 1.21.0. I defined heap size 2 GB.

My code looks like this:

public void parseReportsFromAllPages() {
    parseReportsFromPage(1);
    parseReportsFromPage(2);
    parseReportsFromPage(3);
    parseReportsFromPage(4);
    parseReportsFromPage(5);
}

public void parseReportsFromPage(final int pageNumber) {
    try (final Playwright playwright = Playwright.create()) {
        try (final Browser browser = playwright.chromium().launch()) {
            try (final Page page = browser.newPage()) {
                final List<String> reportsUrls = parseReportsUrlsFromPage(page, pageNumber);
                final List<ReportDto> reports = parseReports(page, reportsUrls);
                reports.forEach(report -> downloadAndSaveReport(page, report));
            }
            log.info("Page instance has been closed");
        }
        log.info("Browser instance has been closed");
    }
    log.info("Playwright instance has been closed");
}

private List<String> parseReportsUrlsFromPage(Page page, int pageNumber) {
    // page.navigate(...);
    // page.waitForLoadState();
    // page.querySelectorAll().mapToStringList();
}

private List<ReportDto> parseReports(Page page, List<String> reportsUrls) {
    // for each report:
    //     page.navigate(...);
    //     query for data and map to ReportDto
}

private void downloadAndSaveReport(Page page, ReportDto report) {
    // final APIResponse response = page.request().get(report.getDownloadLink())
    // Files.write(file.toPath(), response.body());
}

@Getter
@RequiredArgsConstructor
private static class ReportDto {
    private final String downloadLink;
}

Results: I cannot parse all the reports from all the pages, it stops during processing the second page (fails with java.lang.OutOfMemoryError: Java heap space) trying to download reports.

There're 200 reports on a page, average report size is 10 MB.

So I can conclude that page.close() / browser.close() / playwright.close() (I use try-with-resource so it should be done automatically) don't release used memory:

Here's Heap Memory Chart over time (image link): https://www.dropbox.com/s/ibn8y5j6iq5d4so/playwright-oom.png?dl=0

roy-k commented 2 years ago

I am using pageshot to save images, but the memory increased obvious~ when check code line by line, I find this: It seems pageshot method case a memory leak?

  1. the heap keeps even I didn't use the Buffer

    image image
  2. and when delete this code, the Retained Size clear:

    image image
dgtlmoon commented 2 years ago

I can confirm the same, 1.22.0, python

            browser = browser_type.connect_over_cdp(self.command_executor, timeout=timeout * 1000)

            context = browser.new_context(
                user_agent=request_headers['User-Agent'] if request_headers.get('User-Agent') else 'Mozilla/5.0',
                proxy=self.proxy,
                # This is needed to enable JavaScript execution on GitHub and others
                bypass_csp=True,
                # Should never be needed
                accept_downloads=False
            )

            page = context.new_page()
            response = page.goto(url, timeout=timeout * 1000, wait_until=None)
            page.screenshot(type='jpeg', clip={'x': 1.0, 'y': 1.0, 'width': 1280, 'height': 1024})

            context.close()
            browser.close()

I saw memory usage up to about 1.5Gb this morning, what is curious is that it's not all URL/pages that cause this, I'll report more info

vinismarques commented 2 years ago

Upvoting. I'm also facing this issue with Python and the scenario is similar to OP. Node.js JavaScript Runtime consumed more than 4 GB of RAM before crashing. In my use case, I only need to open one Browser, one Context, and one Page. After that, I should be able to navigate through the website without a problem. But what I'm seeing is that the RAM usage for Node.js JavaScript Runtime keeps growing nonstop. I tried closing the Page and reopening every N requests, but it did nothing to clear the RAM.

This memory leak highly hurts Playwright's reliability.

roniemartinez commented 2 years ago

Same here on Python. I got almost 6GB memory usage doing page.goto() on a few hundred URLs before crashing (this is just one page).

Ignore this. Although, there was some "javascript" errors in console during the crash, it was not that high of a memory usage now (cannot replicate it).

z719893361 commented 2 years ago

Have you solved it

roniemartinez commented 2 years ago

@z719893361 For me, I did a workaround. But this is something that cannot be replicated with simple code.

Solution was to close the page for each goto(): https://github.com/roniemartinez/dude/pull/174

limestackscode commented 2 years ago

when you await response.body() playwright caches the whole response in memory forever entill you close the brower or page causing node to use increasingly more memory. after 24 hours node was using 13GB of memory! example

const { firefox } = require('playwright')
const path = require('path')
const fs = require('fs')

async function memoryLeak(page){
    page.on('requestfinished', request => {
        if(path.extname(request.url()) == '.ts'){
            if(request.postDataBuffer() == null){
                request.response().then(async (response) => {
                    let header = response.headers()
                    if(header['content-type'] == 'application/octet-stream'){
                        if(header.hasOwnProperty('transfer-encoding')){
                            await response.body()
                        }
                    }
                })
            }
        }
    })
}

(async () => {
    const browser = await firefox.launch({headless: true,firefoxUserPrefs: {'media.volume_scale': '0'}}) 
    const page = await browser.newPage()
    await page.goto('https://www.twitch.tv/relaxingfan')
    memoryLeak(page)
})();

some way to free cached responses and requests without closing the browser or page would be nice

mxschmitt commented 2 years ago

@limestackscode Browser.newPage() does create a new context for you internally. You can re-create your context, and then it will release the memory accordingly.

dgtlmoon commented 2 years ago

@mxschmitt yeah, I spun up a simple python script https://github.com/dgtlmoon/playwright-python-memleak-test/blob/master/README.md , I know I can see the problem in another app (using Flask) but I can't reproduce it, I'm using newPage() and page.close()

If I leave my other app running, it will balloon out to gigabytes

I hope my little repo can get us closer to the answer somehow

anyone else - please fork and improve my test :pray: lets try to figure out where the leak is

Maybe its something around the app, like when combined with Flask or .. who knows

@itepifanio @AIGeneratedUsername are you able to use/make-PR's to that repo to try reproduce your case?

limestackscode commented 2 years ago

@mxschmitt you're right i edited my comment but its weird that in my tests just closing the page doesnt clear the memory i had to close the page and open a new page

parigi-n commented 2 years ago

For more reliability outside tests context, couldn't we have an option inside the browser or context to just don't log these kind of info ? is this a possibility ? Closing context seems to work, but it doesn't seem obvious at all.

Also, in use case of lot of network requests, it's taking a lot of memory for no real added value

jfp1992 commented 2 years ago

Also having this issue, we need to run our tests linearly and I would save a lot of execution time keeping one browser session. But over time the memory leak gets too much and the CI environment locks up because it fills up the memory.

Edit: I have done some testing and every page load pythons memory increases by ~2MB-3MB this memory usage quickly adds up.

Edit2: The Node,js process also appears to be increasing about half as much

arpowers commented 2 years ago

@aslushnikov whats the status of the memory leak fix?

In our case, we get memory leaking on a per context basis even if we close context AND the browser itself. No clear way of restoring it to baseline, so we are triggering a health check fail. Super duper hacky trying to run Playwright in production ...

I saw a PR with a fix though, has it been released?

Screen Shot 2022-07-29 at 1 17 19 PM
MattHorrocksNetwealth commented 2 years ago

We're finding the same thing @arpowers

image

We're closing the contexts, then the browser...

                foreach (var context in Browser.Contexts)
                {
                    logger.Info("Closing Context " + context);
                    await context.CloseAsync();
                }

                logger.Info("Closing Browser");
                await Browser.CloseAsync();
MattHorrocksNetwealth commented 2 years ago

@mxschmitt

Here's a very simple c# project which exhibits the memory leak, despite all pages, contexts and the browser being closed in the TearDown at the end of each test.

https://github.com/MattHorrocksNetwealth/playwright-leak

Using Visual Studio, select all the tests, then Debug run.

Notice the memory steadily increasing.

image

Of course, I might be doing something incorrect here, but it seems to me that all resources should be freed.

Happy to pitch in to get this resolved.

mxschmitt commented 2 years ago

@mxschmitt

Here's a very simple c# project which exhibits the memory leak, despite all pages, contexts and the browser being closed in the TearDown at the end of each test.

https://github.com/MattHorrocksNetwealth/playwright-leak

Using Visual Studio, select all the tests, then Debug run.

Notice the memory steadily increasing.

image

Of course, I might be doing something incorrect here, but it seems to me that all resources should be freed.

Happy to pitch in to get this resolved.

You are not disposing the Playwright object, which would release the 60MB.

MattHorrocksNetwealth commented 2 years ago

Damn, I just noticed that myself :D

What a doughnut.

MattHorrocksNetwealth commented 2 years ago

I'll fix the example tomorrow and let you know how I get on. I'm sure in our main framework we do dispose of the playwright object correctly, and we still lose memory, though.

VikramTiwari commented 2 years ago

Hey @mxschmitt, thanks for looking into this issue.

A quick question on this issue. Would the memory still be cleared if the new page was closed from something that's not PW? For example, an extension that creates and closes page while PW is connected. I understand that PW will see this page object and do all the necessary setup. However, would this page object be cleared from the browser context when it's closed by extension?

Meemaw commented 2 years ago

We're facing the same issues trying to reuse browser context, and just creating a newPage for each request.

We are closing the pages, but Playwright seems to not dispose those references and the memory leak is quite server on a http server.

This is very easy to reproduce, I could prepare a reproducer if needed.

LuohuaRain commented 2 years ago

We're facing the same issues trying to reuse browser context, and just creating a newPage for each request.

We are closing the pages, but Playwright seems to not dispose those references and the memory leak is quite server on a http server.

This is very easy to reproduce, I could prepare a reproducer if needed.

Actually, I did all reproducers last year, but nothing feedback till now.

nickdooley2016 commented 2 years ago

Any solution? Why after a year there is still no fix? This problem hinders playwright's usability. There's nothing more annoying than wasting time integrating a third-party package to find out later that it's riddled with bugs and incompetent developers.

thomasol commented 2 years ago

They have explained you must close the context and dispose of the playwright object. You can't keep the context alive for a long time. A stupid comment like that helps no one.

nickdooley2016 commented 2 years ago

They have explained you must close the context and dispose of the playwright object. You can't keep the context alive for a long time. A stupid comment like that helps no one.

Do you close your browser every time you navigate to a new URL? I know you don't. This is not a solution to the problem.

vinismarques commented 2 years ago

The need to close the context and dispose of the playwright object might not be possible/desirable when working with RPA or even scraping infinite scroll pages. It adds a considerable amount of code to work around the bug.

rigwild commented 2 years ago

Please, can a maintainer lock this conversation? New comments just keep repeating each other. Everything has already been said. Closing the context does not fix the issue.

Repro provided at comment https://github.com/microsoft/playwright/issues/6319#issuecomment-917705023

nickdooley2016 commented 2 years ago

Please, can a maintainer lock this conversation? New comments just keep repeating each other. Everything has already been said. Closing the context does not fix the issue.

Repro provided at comment #6319 (comment)

And yet we still have no fix. I'll be digging into the source code later on to see if I can flush it or signal a flush somehow. Obviously developers won't be doing anything about this.

nickdooley2016 commented 2 years ago

I resolved my issue by doing the following.

  1. Saved the browser's state to a local file (session, local storage, etc) after creating the browser/context and performing the actions required to meet my needs: context.StorageState("state.json")

  2. Close browser, context and kill all node.exe processes every 30 minutes. (this is where the memory leak exists for me), if you don't kill them it creates a separate node.exe process every time. The previous process remains in memory taking up space.

  3. Create new browser/context and load in the saved state.. navigate back to where you need to be. context, err := browser.NewContext( playwright.BrowserNewContextOptions{ StorageStatePath: playwright.String("state.json"), })

While this won't help with infinite scroll or other scenarios it might help some of you. A good example where this would work fine is creating a session with a QR code (for my situation) or after a simple login.

dgtlmoon commented 2 years ago

Tip - I incorrectly blamed playwright for a memory leak in my app, I have a class which wraps playwright todo a little web-page IO, initially to me it looked like page.evaluate() and other calls were causing the memory to get used and never recycled/emptied However strangely, when I tried the following, (from https://github.com/weblyzard/inscriptis/issues/65 ) it also resolved the issue where page.evaluate(...) used a lot of RAM and never returned it back to the system (it returns a very large JSON struct)

  self.xpath_data = page.evaluate("async () => {" + self.xpath_element_js + "}")
  import ctypes
  libc = ctypes.CDLL("libc.so.6")
  libc.malloc_trim(0)

My advice here - try to be sure that your own app is not doing something unexpected, be 100% sure that something like LXML's memory leak bug is not lurking around somewhere