microsoft / playwright-dotnet

.NET version of the Playwright testing and automation library.
https://playwright.dev/dotnet/
MIT License
2.47k stars 235 forks source link

[Bug]: Node and Chromium processes remain despite calling IPlaywright.Dispose() #1749

Closed corygehr closed 2 years ago

corygehr commented 3 years ago

Playwright version

1.14.1

Operating system

Linux

What browsers are you seeing the problem on?

Chromium

Other information

.NET 5; Ubuntu Server 18.04 (Azure Kubernetes).

What happened? / Describe the bug

Despite calling IBrowserContext.DisposeAsync() and IPlaywright.Dispose(), I see zombie processes on my Linux host - one Node process paired with two Chromium instances (see the Log Output).

My service is a .NET Console Application which runs indefinitely (until terminated by the Kubernetes host). and needs to tear down the IDriver after my service finishes processing a request.

Code snippet to reproduce your bug

// Create Driver and Browsers.
var driver = Playwright.CreateAsync().Result;
var browserInstance = await driver.Chromium.LaunchAsync();

// Do work...

// Dispose.
browserInstance.CloseAsync().Wait();
driver.Dispose();

Relevant log output

ps aux

root     23085  0.0  0.0      0     0 ?        Z    01:35   0:00 [chrome] <defunct>
root     23086  0.1  0.0      0     0 ?        Z    01:35   0:02 [chrome] <defunct>
root     24215  0.3  0.0      0     0 ?        Z    Sep17   0:34 [node] <defunct>
root     24334  0.0  0.0      0     0 ?        Z    Sep17   0:00 [chrome] <defunct>
root     24335  0.0  0.0      0     0 ?        Z    Sep17   0:02 [chrome] <defunct>
root     25653  1.2  0.0      0     0 ?        Z    01:16   0:35 [node] <defunct>
root     25770  0.0  0.0      0     0 ?        Z    01:16   0:00 [chrome] <defunct>
root     25771  0.0  0.0      0     0 ?        Z    01:16   0:02 [chrome] <defunct>
root     26958  0.3  0.0      0     0 ?        Z    Sep17   0:35 [node] <defunct>
root     27076  0.0  0.0      0     0 ?        Z    Sep17   0:00 [chrome] <defunct>
root     27077  0.0  0.0      0     0 ?        Z    Sep17   0:02 [chrome] <defunct>
root     28314  0.9  0.0      0     0 ?        Z    00:57   0:36 [node] <defunct>
root     28431  0.0  0.0      0     0 ?        Z    00:57   0:00 [chrome] <defunct>
root     28432  0.0  0.0      0     0 ?        Z    00:57   0:02 [chrome] <defunct>
root     29734  0.3  0.0      0     0 ?        Z    Sep17   0:33 [node] <defunct>
root     29851  0.0  0.0      0     0 ?        Z    Sep17   0:00 [chrome] <defunct>
root     29852  0.0  0.0      0     0 ?        Z    Sep17   0:01 [chrome] <defunct>
root     30989  0.7  0.0      0     0 ?        Z    00:38   0:36 [node] <defunct>
root     31107  0.0  0.0      0     0 ?        Z    00:38   0:00 [chrome] <defunct>
root     31108  0.0  0.0      0     0 ?        Z    00:38   0:02 [chrome] <defunct>
pavelfeldman commented 3 years ago

Can you reproduce this outside K8S?

trebor678 commented 3 years ago

I have a similar issue where node isn't closing and it creates a new process each time it creates a new instance. I'm using Firefox though and that seems to be closing fine.

I'm running this on a standard Windows machine

corygehr commented 3 years ago

I have not tried outside of K8S, though I do occasionally see a stuck Chromium window when running on my local Windows machine (non-headless mode). Will try to test against a Linux VM without K8S.

corygehr commented 3 years ago

Still haven't played with this outside of Kubernetes, but I came across an article discussing this:

https://www.back2code.me/2020/02/zombie-processes-back-in-k8s/

The short version is: adding shareProcessNamespace: true to the spec segment of the deployment seems to fix the issue. However, it doesn't feel like a solid solution - I don't think it's obvious to folks that they need to do it until they notice the problem and start searching, and I don't know about the security implications of adding this flag.

I also came across a Stack Overflow post with a similar issue:

https://stackoverflow.com/questions/43515360/net-core-process-start-leaving-defunct-child-process-behind

I do notice that Program.cs in Playwright.Core creates a Process but doesn't set the EnableRaisingEvents property to true. I'm not sure if that's relevant here, especially because I don't think Program is used when calling Playwright.CreateAsync() and I have not dug deeper to see where else Process objects may be created from the .NET , library, but it might be worth investigating. I'm not sure if something (either in dotnet or the container) waits for the event to get raised to clean up the driver.

fr4gles commented 3 years ago

We have same problem on Windows 10 / Windows Server 2019 - node.exe process remain detached from parent process besides

node.exe processes are closed automatically after main process is closed.

corygehr commented 3 years ago

I can repro this on my developer machine as well (Windows). node.exe sticks around, along with two chrome.exe processes - this is after updating my code per our offline discussion to properly implement the async methods provided by the Playwright library.

I'm having a hard time telling if this issue is actually resolved in my K8 cluster per my comments above as well. I no longer see defunct processes, but there are several which still appear to be running (I think this might be expected due to sharing processes across the namespace?)

Meanwhile, they seem to be getting OOMKilled after some time which I suspect is due to going over their memory allocation because of the lingering processes.

kababoom commented 3 years ago

Same issue here.

Node processes are left behind until main program closed.

Verified on Windows 10, Ubuntu 20 and OSX 11

I assume this is because in StdIOTransport.cs we start _process playwright.sh which opens in bash, later we kill this _process but it only kills the bash process.

kababoom commented 3 years ago

First of all it might be I'm doing something wrong, if so please let me know..

Without any K8s looping the frontpage sample with added close/dispose like this:

        while (true)
        {
            using var playwright = await Playwright.CreateAsync();
            await using var browser = await playwright.Chromium.LaunchAsync(new() { Headless = false });
            var page = await browser.NewPageAsync();
            await page.GotoAsync("https://playwright.dev/dotnet");
            await page.ScreenshotAsync(new() { Path = "screenshot.png" });

            await browser.CloseAsync();
            await browser.DisposeAsync();
            playwright.Dispose();

            await Task.Delay(2500);
        }

Number of node processes will keep growing until exiting main..

Problem might lay elsewhere (node + cli.js not exiting cleanly) but adding a dispose before the kill in StdIOTransport.cs fixes it .

        public void Close(string closeReason)
        {
            if (!IsClosed)
            {
                IsClosed = true;
                TransportClosed?.Invoke(this, new() { CloseReason = closeReason });
                _readerCancellationSource?.Cancel();
                try
                {
                    _process?.Dispose(); //This releases the other resources like node.
                    _process?.Kill();
                }
                catch
                {
                }
            }
        }
fr4gles commented 3 years ago

BTW @kababoom

_process?.Dispose(); //This releases the other resources like node.
_process?.Kill();

IMO Kill() should be called before Dispose()

kababoom commented 3 years ago

@fr4gles

It's just to illustrate dispose does release the child node processes. Up to the devs howto.

jasomusc commented 2 years ago

On this version it works - https://github.com/microsoft/playwright-dotnet/pull/1813#partial-pull-merging

It adds process?.Dispose(); as mentioned by @kababoom

andliang commented 2 years ago

I'm also seeing zombie node processes depsite calling Dispose() and DisposeAsync() methods

AlexSimionCommify commented 1 month ago

Hello, issue still occurs, but for node processes - in both versions: "Microsoft.Playwright" Version="1.36.0" /> and latest version The node.exe processes keep pilling up until the tests are all done or I stop them

Can you please let me know what can be done in this case?