yarnpkg / berry

📦🐈 Active development trunk for Yarn ⚒
https://yarnpkg.com
BSD 2-Clause "Simplified" License
7.45k stars 1.11k forks source link

Benchmark PnP runtime overhead #1817

Open remorses opened 4 years ago

remorses commented 4 years ago

Describe the user story

I saw that the yarn pnp.js file extends some nodejs native functionality like require, path and fs functions

To prevent performance degradation we should make benchmark and stress tests that measure the runtime overhead, for example creating a bundle with webpack of a big project with at least 50 packages

Other performance degradations could be in glob searches

I am also scared that some of these functions reimplementations have time complexity that grows with the workspace packages you have, some of us have workspaces with 200 packages

Describe the solution you'd like

Implement a benchmark to measure the runtime overhead

Describe the drawbacks of your solution

Describe alternatives you've considered

Additional context

It is still not very clear to me why yarn reimplements fs, it would be cool to read more about it

crubier commented 4 years ago

I just did on a very small scale example, and there is already a significant impact of yarn berry that I did not expect.

https://github.com/crubier/yarn-benchmarks

For example, on a simple create-react-app + storybook, build storybook takes 23s with yarn berry, VS 12s with npm.

I think we suffer from this internally, on our big monorepo with many storybooks, the storybooks take a total of 90 min to build.... reducing to 45min would already help a lot...

larixer commented 4 years ago

@crubier Have you tried to check if the impact in Yarn berry due to zip compression, i.e. compessionLevel Yarn berry option

crubier commented 4 years ago

@larixer no I didn't know about this option. Let me try it now..

crubier commented 4 years ago

BTW What is the reason for compressing the cache? Is it only for saving space on the hard drive?

larixer commented 4 years ago

@crubier Yep, to save space on the hard drive or in the Git repo

crubier commented 4 years ago

Putting compessionLevel: 0 improves performance by around 0-10%, not much.

For example storybook goes from 21s-25s to 19s-24s

merceyz commented 4 years ago

build storybook takes 23s with yarn berry, VS 12s with npm.

That's not a good comparison, what you actually want to compare is PnP vs node_modules in V2

crubier commented 4 years ago

@crubier Yep, to save space on the hard drive or in the Git repo

I think the issue raised by @remorses here is a very good idea. I think the runtime performance impact of yarn on real-life benchmark has been very overlooked, and now might be the time to start thinking about it a bit more. On my part for example, I dont care about saving a few GB on my disk if this meant getting faster performance.

An example of the overlooking of runtime performance was https://github.com/yarnpkg/berry/issues/973 , which was mostly resolved thankfully (on our monorepo there is now "only" a delay of 7s before each yarn command, instead of 5minutes...)

What surprised me in this little benchmark is that yarn berry seems to have a 150-200% performance impact on many real life scripts.

crubier commented 4 years ago

That's not a good comparison, what you actually want to compare is PnP vs node_modules in V2

And yes, I include PnP when I say Yarn v2. The point of this issue is mostly about PnP I think.

I could add yarn v1 with pnp and yarn v2 without pnp in the benchmark.

paul-soporan commented 4 years ago

Your install benchmark isn't useful the way it's currently done, since it doesn't take into account if the archives are already in the cache or if the lockfile is already generated. If you need an example of a good install benchmark, here is ours; the results are here.

I could add yarn v1 with pnp

That's wouldn't be useful, as Yarn 1's PnP was just an experiment and not production ready.

yarn v2 without pnp in the benchmark.

Yes, Yarn 2 with the node-modules linker would be useful.

crubier commented 4 years ago

@paul-soporan

Your install benchmark isn't useful the way it's currently done, since it doesn't take into account if the archives are already in the cache or if the lockfile is already generated.

I know about that of course, I ran the benchmark only after installing in each folder (And the benchmark is ran several times in a row without cleanup). So it's a "reinstall" benchmark, not a "first install" benchmark.

merceyz commented 4 years ago

npm and Yarn v1 are irrelevant imo, what we're interested in is the performance overhead of Yarn V2's PnP mode vs node_modules. npm and Yarn v1 has hoisting bugs which can give them unfair advantages (v2 as far as we know, doesn't)

I think the runtime performance impact of yarn on real-life benchmark has been very overlooked

That's a false statement, i've done optimizations on real world usecases, for example:

You can see all the performance related PRs i've done here https://github.com/yarnpkg/berry/pulls?q=is%3Apr+author%3Amerceyz+perf+is%3Aclosed

I know about that of course, I ran the benchmark only after installing in each folder. So it's a "reinstall" benchmark, not a "first install" benchmark.

The only case our benchmarks doesn't cover is runtime overhead of PnP vs node_modules, the rest we cover already https://p.datadoghq.eu/sb/d2wdprp9uki7gfks-c562c42f4dfd0ade4885690fa719c818

crubier commented 4 years ago

@merceyz "overlooked" does not mean "completely forgotten":

In the past 3 months you made at least 2 PRs that each improved by 100% the performance on certain use cases, this is very good and going in the right direction, but it also shows that there are still wide margins for improvement.

As I mentioned I also made a PR 7 months ago ( https://github.com/yarnpkg/berry/pull/998 ) which improved by 2500% the runtime performance on large monorepos. There again, there was large margin for improvement of the runtime performance.

My only point is that we should indeed start measuring this runtime performance improvement because this is now becoming the limiting factor for us internally.

crubier commented 4 years ago

@merceyz

Comparing npm with yarn berry without pnp gives indeed smilar results.

So yes, as expected, we can compare yarn pnp vs yarn node_modules, and we get:

berry
build-storybook ok

real    0m24.025s
user    0m30.199s
sys     0m2.798s

berry-no-pnp
build-storybook ok

real    0m13.496s
user    0m13.646s
sys     0m2.200s
remorses commented 4 years ago

I have just run yarn run --inspect start-storybook on a workspace with 250 packages and made a profile, it turns out traverseDependencyTree takes 72% of the time

traverseDependencyTree's time complexity increases with number of dependencies, this is not a good idea, we should find another way

Schermata 2020-09-09 alle 15 18 40

paul-soporan commented 4 years ago

@remorses Out of curiosity, where does traverseDependencyTree come from? We don't have such a function inside the PnP hook and it doesn't appear when I profile https://github.com/crubier/yarn-benchmarks/tree/master/single-berry. Is it from the fsevents patch? Is it from another dependency?

remorses commented 4 years ago

It comes from .yarn/unplugged/fsevents-patch-5aa22a6c1e/node_modules/fsevents/vfs.js

remorses commented 4 years ago

It is in plugin-compat in the source code

https://github.com/yarnpkg/berry/blob/9e271d5db0bcd89dde9edef69194eabd04e162d1/packages/plugin-compat/extra/fsevents/vfs.js#L26

larixer commented 4 years ago

@remorses Indeed this algorithm is very inefficient because of seen.delete, it can run for ages on some dependency graphs. It can be rewritten in much more efficient manner

arcanis commented 4 years ago

Regarding the initial topic, yep, we currently track install times but not run times. It would be a good idea to have a dashboard for runtimes as well 👍 Also, we should put the dashboard link somewhere on the website 🤔

crubier commented 4 years ago

I updated my benchmark repo to start performing benchmark on monorepos at runtime.

https://github.com/crubier/yarn-benchmarks

Here are some quick results:

That's a x4 performance penalty by doing the same thing, but in a monorepo (without importing dependencies from the monorepo, just by being in the monorepo), and using PnP (because it doesn't work without it anyway, probably storybooks fault again.)

crubier commented 4 years ago

This seems to match with what @remorses says, the 4x performance hit seems to match the 72% of the time spent in traverseDependencyTree .

So the good news is that this particular problem seems fixable! I think I can help, but the file in question lacks documentation so it's hard to know where to start

paul-soporan commented 4 years ago

@larixer Do you intend to open a PR improving the performance of traverseDependencyTree from the fsevents patch? (Asking you since you're the one that discovered the problem of that algorithm)

larixer commented 4 years ago

@paul-soporan To be fair I'm busy now with nohoist so I don't plan to hack traverseDependencyTree anytime soon. I think removing this line seen.delete should be enough, but someone needs to double check that of course.

larixer commented 4 years ago

@crubier

Storybook with Yarn v2 + Node Module resolution + Monorepo with 200 packages = FAIL to build 🔴

This worries me, a bit :) NM_DEBUG_LEVEL=1 yarn on that Monorepo throws or not? NM_DEBUG_LEVEL=1 performs self-checking if produced node_modules install tree is correct NM_DEBUG_LEVEL=2 performs self-checking and also outputs the reasons why packages were not hoisted to the top node_modules

crubier commented 4 years ago
➜ NM_DEBUG_LEVEL=1 yarn
➤ YN0000: ┌ Resolution step
➤ YN0002: │ multi@workspace:. doesn't provide @testing-library/dom@>=5 requested by @testing-library/user-event@npm:7.2.1
➤ YN0002: │ multi@workspace:. doesn't provide webpack@>=2 requested by babel-loader@npm:8.1.0
➤ YN0002: │ react-color@npm:2.18.1 doesn't provide react@* requested by @icons/material@npm:0.2.4
➤ YN0002: │ @mdx-js/loader@npm:1.6.16 doesn't provide react@^16.13.1 requested by @mdx-js/react@npm:1.6.16
➤ YN0002: │ @storybook/addon-essentials@npm:6.0.21 [3c62c] doesn't provide @babel/core@^7.0.0-0 requested by @storybook/addon-docs@npm:6.0.21
➤ YN0002: │ @storybook/preset-create-react-app@npm:3.1.4 [3c62c] doesn't provide typescript@>= 3.x requested by react-docgen-typescript-plugin@npm:0.5.2
➤ YN0002: │ react-docgen-typescript-loader@npm:3.7.2 [cbc10] doesn't provide webpack@^3.0.0 || ^4.0.0 requested by @webpack-contrib/schema-utils@npm:1.0.0-beta.0
➤ YN0002: │ @storybook/react@npm:6.0.21 [3c62c] doesn't provide typescript@>= 3.x requested by react-docgen-typescript-plugin@npm:0.5.2
➤ YN0002: │ foobar@workspace:packages/single-yarn doesn't provide @testing-library/dom@>=5 requested by @testing-library/user-event@npm:7.2.1
➤ YN0002: │ foobar@workspace:packages/single-yarn doesn't provide webpack@>=2 requested by babel-loader@npm:8.1.0
➤ YN0000: └ Completed in 2.16s
➤ YN0000: ┌ Fetch step
➤ YN0000: └ Completed in 1.49s
➤ YN0000: ┌ Link step
hoist: 954.502ms
➤ YN0000: └ Completed in 7.16s
➤ YN0000: Done with warnings in 11.25s

Then

➜ cd package/the-package-to-build-storybook-on
➜ yarn build-storybook                                                                                     
info @storybook/react v6.0.21
info 
info clean outputDir..
info => Copying static files from: public
info => Copying prebuild dll's..
info => Building manager..
info => Loading manager config..
info => Loading presets
info => Compiling manager..
info => manager built (5.03 s)
info => Building preview..
info => Loading preview config..
info => Loading presets
info => Loading config/preview file in "./.storybook".
info => Loading config/preview file in "./.storybook".
info => Adding stories defined in ".storybook/main.js".
ERR! Failed to resolve a `react-scripts` package.
info => Using default Webpack setup.
info => Compiling preview..
ERR! => Failed to build the preview
ERR! ./src/stories/assets/code-brackets.svg 1:0
ERR! Module parse failed: Unexpected token (1:0)
ERR! You may need an appropriate loader to handle this file type, currently no loaders are configured to process this file. See https://webpack.js.org/concepts#loaders
ERR! > <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="48" height="48" version="1.1" viewBox="0 0 48 48"><title>illustration/code-brackets</title><g id="illustration/code-brackets" fill="none" fill-rule="evenodd" stroke="none" stroke-width="1"><path id="Combined-Shape" fill="#87E6E5" d="M11.4139325,12 C11.7605938,12 12,12.5059743 12,13.3779712 L12,17.4951758 L6.43502246,23.3839989 C5.85499251,23.9978337 5.85499251,25.0021663 6.43502246,25.6160011 L12,31.5048242 L12,35.6220288 C12,36.4939606 11.7605228,37 11.4139325,37 C11.2725831,37 11.1134406,36.9158987 10.9453839,36.7379973 L0.435022463,25.6160011 C-0.145007488,25.0021663 -0.145007488,23.9978337 0.435022463,23.3839989 L10.9453839,12.2620027 C11.1134051,12.0841663 11.2725831,12 11.4139325,12 Z M36.5860675,12 C36.7274169,12 36.8865594,12.0841013 37.0546161,12.2620027 L47.5649775,23.3839989 C48.1450075,23.9978337 48.1450075,25.0021663 47.5649775,25.6160011 L37.0546161,36.7379973 C36.8865949,36.9158337 36.7274169,37 36.5860675,37 C36.2394062,37 36,36.4940257 36,35.6220288 L36,31.5048242 L41.5649775,25.6160011 C42.1450075,25.0021663 42.1450075,23.9978337 41.5649775,23.3839989 L36,17.4951758 L36,13.3779712 C36,12.5060394 36.2394772,12 36.5860675,12 Z"/><rect id="Rectangle-7-Copy-5" width="35.57" height="4" x="5.009" y="22.662" fill="#A0DB77" rx="2" transform="translate(22.793959, 24.662305) rotate(-75.000000) translate(-22.793959, -24.662305)"/></g></svg>
ERR!  @ ./src/stories/Introduction.stories.mdx 9:0-46 204:9-13
ERR!  @ ./src sync ^\.(?:(?:^|\/|(?:(?:(?!(?:^|\/)\.).)*?)\/)(?!\.)(?=.)[^\/]*?\.stories\.mdx)$
ERR!  @ ./.storybook/generated-stories-entry.js
... And 1000s of similar errors
crubier commented 4 years ago

The output of NM_DEBUG_LEVEL=2 yarn is 6700 lines long so I will spare you this. You can check it on my repo in the folder multi-yarn-nm

merceyz commented 4 years ago

Storybook doesn't always work in monorepos, it depends on the hoisting layout you get. PR here https://github.com/storybookjs/storybook/pull/11753

crubier commented 4 years ago

@larixer indeed removing seen.delete seems to make the situation better without breaking anything as far as I know.

After removing seen.delete() the situation is:

So to recap: removing seen.delete() makes the situation better for monorepos, by removing a large part of the monorepo-induced overhead spent in traverseDependencyTree as hown by @remorses. But pnp (monorepo or not) still has a ~2x performance penality as compared to node module)

larixer commented 4 years ago

@crubier Looks like we have another bottleneck then. seen.delete affects only large dependency tree, but there is no effect on small dependency trees which is expectable

crubier commented 4 years ago

I opened a PR for the seen.delete part of the problem @larixer , I'll run tests locally, is there anything else I can do to gain confidence that it does not break things ?

larixer commented 4 years ago

@crubier I think it should be all fine, This method just registers all the locations, so visiting graph nodes exactly once should be enough. You should also run yarn version check -i on your PR and select patch for plugin-compat and for yarnpkg-cli and possibly add an entry to CHANGELOG.md

larixer commented 4 years ago

@crubier One observation that bothers me is that from the same set of source files the storybook produces output files that significantly differ in size! Compare output files for node_modules and for pnp

merceyz commented 4 years ago

I'm curious of why fsevents is involved in a build, where file watching shouldn't happen

crubier commented 4 years ago

@larixer

One observation that bothers me is that from the same set of source files the storybook produces output files that significantly differ in size! Compare output files for node_modules and for pnp

This is something I have noticed on my internal monorepo as well.

Here is the webpack bundle analyzer output:

For yarn-node-modules

Screenshot 2020-09-09 at 21 07 46

For yarn-pnp

Screenshot 2020-09-09 at 21 07 25
crubier commented 4 years ago

The problem seems to be that yarn/cache ends up in the bundle, while only yarn/$$virtual should actually end up there.

@merceyz since you seem to know a bit about storybook maybe you have an idea? Again this is default create-react-app + default storybook

larixer commented 4 years ago

@crubier I see a problem with Storybook/Webpack not been able to delegate to sb_dll/*_dl.js files under PnP and thus recompiling all the stuff it needs from node_modules again and again. To check this you can create .storybook/yarn-preset.js:

async function yarn2Config(config, options) {
  const newConfig = {
    ...(config || {})
  };
  newConfig.optimization = {minimize: false, splitChunks: false, runtimeChunk: false};
  newConfig.mode = 'development';
  return newConfig;
}

module.exports = { managerWebpack: yarn2Config, webpack: yarn2Config, webpackFinal: yarn2Config };

basically turn off all optimizations, switch to development mode and generate single bundle. You need to modify .storybook/main.js and add "presets": [require.resolve("./yarn-preset")] Run with this preset both NM and Yarn PnP, then diff produced bundles. You will see in the diff something like this:

-/***/ "./node_modules/@emotion/core/dist/core.browser.esm.js":
-/*!************************************************************************************************!*\
-  !*** delegated ./@emotion/core/dist/core.browser.esm.js from dll-reference storybook_docs_dll ***!
-  \************************************************************************************************/
+/***/ "./.yarn/$$virtual/@emotion-core-virtual-283d648f44/0/cache/@emotion-core-npm-10.0.35-565218757c-5
+/*!*****************************************************************************************************
+  !*** ./.yarn/$$virtual/@emotion-core-virtual-283d648f44/0/cache/@emotion-core-npm-10.0.35-565218757c-5
+  \*****************************************************************************************************

e.g. NM bundle delegates many functions to DLLs, but PNP bundle compiles all of them into bundle

larixer commented 4 years ago

@larixer Oh, the storybook is distributing prebuilt DLLs:

info => Copying prebuild dll's..

inside DLLs you can see:

 !*** /Users/shilman/projects/baseline/storybook/node_modules/@storybook/semver/internal/debug.js ***!

This is a suboptimal decision, because this way the DLLs are dependent on hoisting layout, so they will be reused to a less degree by the package manager that is different from the one used to build the DLLs, even if package manager is the same, but only the version is different the hoisting layout will be different. And of course node_modules prebuilt DLLs won't be used by PnP at all.

They should at least have an option to build dll on a user side. But the functionality like this will not be added anytime soon I think.

You can try disabling DLL usage via --no-dll and configure Webpack to use some compilation caching plugin like hard-source-webpack-plugin instead. I think this way the effect will be more or less the same as with DLLs, but more reliable

crubier commented 4 years ago

@larixer I just tried, and using --no-dll succeeds in making both yarn-nm and yarn-pnp similar in terms of bundle size, but probably not as you (or at least I) expected.

It does not make the yarn-pnp bundle smaller (it does not change anything for pnp basically), but by it make the yarn-nm bundle larger... So it's not really a solution..

arcanis commented 4 years ago

It's an original caching, I wonder if it could be made compatible with PnP, perhaps some with extra Webpack config 🤔 Probably worth opening an issue on the Storybook repository rather than here, since it's not actionable on our side (let's keep this issue here about building a runtime perf dashboard).

cc @gaetanmaisse

larixer commented 4 years ago

@crubier --no-dll places both NM and PnP into the same conditions, after that their perfs are the same for me. The "solution" would be to use build caching that is not hardcoded to node_modules, for example hard-source-webpack-plugin or Webpack 5 or ... some other caching solution that works both for PnP and for NM

crubier commented 4 years ago

@arcanis to go back to the idea of a runtime benchmark, here are a few requirements for a good benchmark in my opinion, which I started to implement on my repo https://github.com/crubier/yarn-benchmarks :

  1. Testing several package configurations
    1. Tests done on a single package
    2. Tests done on single packages inside of a monorepo (large monorepo with 200 packages is good)
    3. Tests done on all packages of a monorepo (lerna style stuff)
  2. Testing several use cases
    1. Run yarn on a simple command (like yarn echo)
    2. Run yarn on a complex command involving only one package of a monorepo (like yarn build)
    3. Run yarn on a complex command involving many packages of a monorepo (like yarn jest at the root, with many jest "projects")
    4. Run yarn on one complex command per package of a monorepo (like yarn lerna run build )

The tests above should allow to benchmark different kinds of overheads that we have seen:

missing1984 commented 4 years ago

Besides the pnp resolution, I would also love to see a benchmark for the patched fs layer