microsoft / playwright-dotnet

.NET version of the Playwright testing and automation library.
https://playwright.dev/dotnet/
MIT License
2.49k stars 237 forks source link

[Feature]: Remove NodeJS runtime from package #1850

Open Liero opened 2 years ago

Liero commented 2 years ago

Feature request

Please, remove NodeJS runtime from a nuget package.

NodeJs could be included in separate nuget package out of convenience. Playwright could use environment variable to determine path to node binaries.

The problem

Including a NodeJS binaries slows down a CI/CD pipeline.

It has to be restored on each build (Azure Hosted pipeline).

Given that I have multiple projects referencing playwright, I have a gigabyte of unnecessary binaries that I have to upload as build artifacts and downloads in my release pipeline. My CI produces 300GB of build artifacts per months with only two developers pushing the code.

Now imagine, that other nuget references would include NodeJS runtime as well! It has to be removed or made optional.

pavelfeldman commented 2 years ago

We consider Node.js runtime to be an implementation detail of Playwright. I'm not sure how deduping Node.js helps addressing the CI/CD challenges. We would still need Node.js on CI to run the tests.

Including a NodeJS binaries slows down a CI/CD pipeline.

My understanding is that we are talking about tens of milliseconds, not even hundreds. Given that the typical test run is minutes, this should be within the error.

It has to be restored on each build (Azure Hosted pipeline).

Could you elaborate on what is getting restored?

Given that I have multiple projects referencing playwright, I have a gigabyte of unnecessary binaries that I have to upload as build artifacts and downloads in my release pipeline. My CI produces 300GB of build artifacts per months with only two developers pushing the code.

Do I understand it right that playwright binaries are a part of the build artifacts?

More information would help us understand and prioritize this request.

Liero commented 2 years ago

We consider Node.js runtime to be an implementation detail of Playwright.

Well, Microsoft.TypeScript.MSBuild package is also dependent on NodeJS and thanks god it does not include yet another NodeJS runtime 🙄

We would still need Node.js on CI to run the tests

There is Node.js installed on all Azure hosted build agents or github actions. Otherwise you would use a docker image with Node, or a separate step to install node as part of you pipeline. Including NodeJS as part of the package is unnecessary in wast majority of cases.

Could you elaborate on what is getting restored?

Nuget package references are being restored on each CI build. If the Microsoft.Playwright package is 200MB instead of ten, it is it slower about ten seconds. Azure Hosted agent, each pipeline run does a fresh restore, so it has to be downloaded from nuget.org each time.

Do I understand it right that playwright binaries are a part of the build artifacts?

Yes. If an asp.net core project references Playwright, then the node runtime is being published with the project -> this means that output (pipeline artifacts) of my CI pipeline includes now node binaries. In my case about 1GB of node binaries since I have multiple projects referencing Playwright. This slows down my CI yet another 60 seconds.

In order to deploy the project, I need to download my build artifacts in CD pipeline. Yet another dozen of seconds.

image

Did I mentioned, that my server where the app is running already has Node runtime, so all those gigabytes are unnecessary?

You could make the path to node.exe configurable and make the node binaries optional. Maybe you could download node using Playwright.CLI just like browsers?

pavelfeldman commented 2 years ago

It isn't hard for us to point to a compatible Node binary, but I don't think it is going to affect your experience. From what you are reporting, we are talking about 'up to the single digit seconds' per several minutes of the test run.

pavelfeldman commented 2 years ago

Overall I think your request is clear, so I'm marking it as collecting feedback to collect votes.

Liero commented 2 years ago

Until recently, there were 4x node runtimes (win32, win64, mac, linux) = cca 250MB, now there are 3 - 197MB. They are all being copied to output directory of each project, that references it directly or indirectly

I have following projects that I build in CI pipeline and some of them deploy in CD pipeline:

image

  1. App1 (uses playwright for PDF rendering)
  2. App1.E2ETests (uses playwright for automated ui tests)
  3. App2 (uses playwright for PDF rendering)
  4. App2.E2ETests

If I publish them I get 4x .playwright folder in the publish output.

Additionally I have following projects:

  1. App1.IntegrationTests - references App1, which copies additional .playwright folder to output directory
  2. App2.IntegrationTests - references App2, which copies additional .playwright folder to output directory

I don't need .playwright for running IntegrationTests, so I remove it after build. Although if I was about to test the PDF rendering, I would need it.

Additionally I have following have .playwright in the output directory as well, but they run only in CI pipeline, so I don't care:

  1. App1.UnitTests - references App1
  2. App2.UnitTests - references App2
apis3445 commented 2 years ago

For example the issue is that I publish playwright test as artifact to run the test after deploy, the node packages are 200 mb so one artifact is always 200 mb that is too much to an artifact image The nodejs can run from pipeline and the 200 mb is not needed. Too playwright install requires a .csproj that in publish artifact is not included, so maybe another approach can be consider for itnegrate with ci/cd as release without .csproj

jgilbert2017 commented 2 years ago

+1 i concur, please remove the node binaries from the package and make the node executable path configurable in code. i would like to use my system installed node. this would speed up complies / nuget restore / deployment and memory footprint during runtime since the o/s would use a shared image.

changhuixu commented 2 years ago

image

Also, how can we only include ONLY ONE Node binary for the OS we know for sure? Currently, it defaults to include binaries for 3 OS.

mxschmitt commented 2 years ago

In the next version only the selected platform gets included see here: https://github.com/microsoft/playwright-dotnet/pull/1955

We'll release it today or tomorrow.

changhuixu commented 2 years ago

Perfect. Thank you @mxschmitt

jgilbert2017 commented 2 years ago

when using <PlaywrightPlatform>none</PlaywrightPlatform> how do you specify the runtime node path? thanks.

sid-6581 commented 2 years ago

Is it just not possible to use an already installed node and not have to bundle this huge dependency with every application? Why can't "install" download the node dependency as well as the browsers?

Liero commented 2 years ago

In the next version only the selected platform gets included see here: #1955

Thanks, better than nothing, but still far from what I was hoping for :(

It's like Microsoft.EntityFrameworkCore.SqlServer package included the actual SQL Server and make it 3x for each platform!

Literally everybody who integrated Playwright into CD pipeline has the same complaint.

304NotModified commented 2 years ago

FYI This issue is nowadays the most upvoted (open) issue. (See https://github.com/microsoft/playwright-dotnet/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc)

This could be relevant as its a "P3-collecting-feedback" issue.

joem-msft commented 1 year ago

For now our team is working on a workaround where we're having to:

It would be nice to see this issue being resolved as it's a big blocker for adoption.

wodyjowski commented 1 year ago

I'm voting up on this. I actually work with a solution containing over 34 projects using Playwright. We have 2.3GB of node.exe files 🙂 Node.js can be just checked during build...

damianh commented 1 year ago

l I think your request is clear, so I'm marking it as collecting feedback to collect votes.

Feedback and votes have been collecting for nearly 2 years. I reckon that's been long enough.

PhilKochan commented 11 months ago

Yeah. Two years is a long time. But it took me an HOUR to find this Feature Request! Most devs seeing this waste of time/disk/IO would not look that long! :(

filzrev commented 4 months ago

It seems Playwright 1.41 or later seems to supports following environment variables. (Although it's not documented)

It seems it can override Node.js executable path by setting PLAYWRIGHT_NODEJS_PATH.


Currently it can control bundled files by PlaywrightPlatform settings. I hope to have another option to exclude only node.js runtimes. and include Playwright package folder.

filzrev commented 4 months ago

I hope to have another option to exclude only node.js runtimes. and include Playwright package folder.

As a temporary workaround. I've confirmed that it can remove node runtimes by adding the following target to Directory.Build.props on my apps that use playwright.

  <Target Name="RemoveNodeJsRuntimes" AfterTargets="CopyPlaywrightFilesToOutput">
    <ItemGroup>
      <Content Remove="@(Content)" Condition="$([System.String]::Copy('%(Content.PlaywrightFolder)').Contains('node\'))" />
    </ItemGroup>
  </Target>

[!NOTE] When using above custom target. It need to explicitly set PLAYWRIGHT_NODEJS_PATH environment variable.

Example Code

private static void EnsurePlaywrightNodeJsPath()
{
    if (Environment.GetEnvironmentVariable("PLAYWRIGHT_NODEJS_PATH") != null)
        return;

    // Find node.js installation from PATHs.
    string exeName = RuntimeInformation.IsOSPlatform(OSPlatform.Windows) ? "node.exe" : "node";

    var pathEnv = Environment.GetEnvironmentVariable("PATH");
    if (pathEnv == null)
        throw new Exception("Failed to get `PATH` environment variable");

    var paths = pathEnv.Split(Path.PathSeparator);
    foreach (var path in paths)
    {
        string fullPath = Path.Combine(path, exeName);
        if (File.Exists(fullPath))
        {
            Environment.SetEnvironmentVariable("PLAYWRIGHT_NODEJS_PATH", fullPath);
            return;
        }
    }

    throw new Exception("Failed to find Node.js runtime");
}
304NotModified commented 4 months ago

@filzrev what's the use of string.copy in the .props file?

filzrev commented 4 months ago

@304NotModified There is no particular reason. [System.String]::Copy() is used to call .NET string's Contains method.

If there is a more simpler syntax, please let me know.

djhayman commented 1 month ago

Further to all of the above, in cases where we exclusively want to connect to a remote instance of Playwright using ConnectAsync, the dependency on Node.js is entirely unnecessary. Consider the following:

using var playwright = await Playwright.CreateAsync();
await using var browser = await playwright.Chromium.ConnectAsync(wsEndpoint);

The CreateAsync function indirectly launches a node process. This carries with it certain security considerations and might be outright disallowed in some environments.

But then the ConnectAsync function creates and uses a WebSocket to connect to the remote server, and Node.js is never even used.

A solution would be to make ConnectAsync a static function that forgoes node entirely:

await using var browser = await Playwright.ConnectAsync(wsEndpoint);

Then, Node.js itself could be removed from this library. Users of ConnectAsync get a more lightweight library to use, and users of LaunchAsync could install Node.js separately.

djhayman commented 1 month ago

On closer inspection, I see that even ConnectAsync is actually tunnelled through a local node process via LocalUtils and JsonPipe objects. What an absolute abomination of an API design. I don't see how you can possibly say "Playwright for .NET is the official language port of Playwright" when it is barely more than a thin wrapper around a node app.

kblok commented 1 month ago

@mxschmitt, besides all that rant. The idea of connecting to a remote playwright instance without using node doesn't sound that crazy.

djhayman commented 1 month ago

Ended up switching to PuppeteerSharp library acting as a Chrome DevTools Protocol client to connect to vanilla Chromium running in Docker. The PuppeteerSharp ConnectAsync function is very similar to my example above (and of course much of the remaining Puppeteer API is almost identical to Playwright, which is unsurprising given Playwright's origin):

await using var browser = await Puppeteer.ConnectAsync(new ConnectOptions
{
    BrowserURL = browserUrl,
}).ConfigureAwait(false);

Now we can keep our codebase entirely in .NET with no Node.js to be seen.

Would be great for Playwright to support this model (or at least this model for the client, I'm happy to keep the Node.js version running in Docker as the server) as I'd love to use Microsoft-supplied libraries where possible.