[Docs] Explain PnP a bit more

borekb commented 4 years ago

I'm reading Plug'n'Play docs after a while to learn (again 🙈) what it really is. It has these sections:

The node_modules problem
Fixing node_modules
Caveats and work-in-progress

No. 1 & 2 read nicely and are a great introduction to why PnP exists and how the things are much better with it. Also no. 3 is a good overview of the current compatibility.

However, I couldn't find answers to a couple of questions I had about PnP:

How does it actually work? Where are the files stored?
Is Yarn overriding require? Or how exactly is it "telling Node.js where to find dependencies"?
If the actual files are stored outside of the current project directory, does it affect things like Docker build?
Are there some objective disadvantages of PnP? Things to be careful about when adopting this installation strategy?

Plus there are some minor things like what does status = Transparent mean in the compatibility table. But given that there are real-world problems with PnP on some projects, the page feels to be "too positively" written.

arcanis commented 4 years ago

It could be interesting to have someone else than me write this page, as I tend to have almost too much context on how things work to see problems as anything else than temporary blips 🤔

Generally speaking:

How does it actually work? Where are the files stored?

In the yarn cache (by default <project>/.yarn/cache, but can be configured to be shared although it's not that useful on OSX due to cloning mechanisms). They are kept compressed, hence why the stock CRA app is 60MB with PnP and 240MB with classic installs.

Is Yarn overriding require? Or how exactly is it "telling Node.js where to find dependencies"?

Yes, we're currently overriding Module._resolveFilename and Module._load in order to let Node figure out the resolution. We're also adding a layer on top of fs in order to add support for reading inside Zip archives (otherwise a package calling readFile on itself wouldn't work).

On the long term we plan to implement the Loader spec once Node will have finalized it, although it's not clear whether it'll be an option for CommonJS (and there's still the Zip layer, of course).

If the actual files are stored outside of the current project directory, does it affect things like Docker build?

As long as the cache, the sources, and the .pnp.js files are stored at the same relative locations as the last time you ran yarn install, there will be no problem. So for example since the default is to put the cache in the project folder, if you copy your project into another directory, you'll be able to run your application without having to reinstall it.

Are there some objective disadvantages of PnP? Things to be careful about when adopting this installation strategy?

The n_m are a fairly simple strategy to model. You "see it", you can try it out by creating folders and files, so you can easily understand it. By contrast, PnP is a bit more magical for newcomers even though the principles aren't that crazy.

The current implementation is very strict, so if one of your transitive dependencies doesn't play well you need to fix it yourself using packageExtensions. It can be fairly frustrating at times to do nothing wrong but have to fix others' code. To offset that we're planning to introduce a PnP-loose mode where invalid imports will still be resolved as if the hoisting was there, but warnings will be printed on the terminal (at least you'll see all the problematic packages in a single pass rather than having to fix them one by one, not knowing what's the next issue).

There's a small chance our overrides conflict with the environment (either Node or something else), but I'm not overly worried - for example VSCode does the same thing, and the Electron packages have been adding an ASAR layer on top of fs for years, with a much less conforming implementation that the one we use for Zip archives.

Practically speaking there may be a memory usage cost, as the whole resolution map is kept in the process. It's both a pro and a con though, since it means we don't have to make as much stat calls as the regular node_modules resolution.

Design-wise, PnP doesn't play amazingly well with packages that work on multiple dependency trees. For example while we eventually found how to make it work, supporting Vue CLI proved challenging because they spawn a process in a temporary directory (with their own dependencies), then run another install somewhere else, then require files from this somewhere else. Fortunately it's extremely uncommon.

And of course there's always the off chance I roll under a bus or burn out, but I try very hard to look on both sides of the road and one of my goals is to spread knowledge as much as possible (cf this thead, I guess!) and expand the contributor community to offset this kind of problem.

Plus there are some minor things like what does status = Transparent mean in the compatibility table

Native means that the specified project added support in their tools for PnP.

Transparent means that we added support in Yarn for the specified project.

The end result is the same, although "Native" is preferred over "Transparent".

borekb commented 4 years ago

That's an awesome explanation, thank you!

lensbart commented 3 years ago

Thanks for the elaborate explanation Maël. I am however still unsure about how to best use Yarn PnP with Docker:

if you copy your project into another directory, you'll be able to run your application without having to reinstall it

One of the reasons I recently started integrating Docker into my project, was that I ran into issues with yarn install on Elastic Beanstalk (AWS). Specifically, I want to avoid any issues related to differences in operating system.

I currently have my Dockerfile set up as follows:

FROM node:14
WORKDIR /app

COPY [".yarnrc.yml", "package.json", "yarn.lock", "/app/"]
COPY .yarn/ /app/.yarn/
RUN yarn install

This however copies the entire cache directory to the Docker container, and I assume yarn install doesn’t do much, except for printing a message to the console that everything is up-to-date already.

Alternatively, one could copy only those files necessary to be able to run yarn install, and have the installation happen on the container itself:

FROM node:14
WORKDIR /app

COPY [".yarnrc.yml", "package.json", "yarn.lock", "/app/"]
COPY .yarn/releases/yarn-berry.cjs /app/.yarn/releases/yarn-berry.cjs
COPY .yarn/plugins /app/.yarn/plugins
RUN yarn install

This way, I presume any n_m pre/postinstall scripts are executed as well, and all of this happens in a Linux environment (Docker) rather than macOS (my computer).

Is there a “canonical” approach to writing a Dockerfile? Perhaps this could be a useful addition to the docs?

In any case, thanks a lot for your very valuable open source work.

lensbart commented 3 years ago

Nevermind my question above. As I learned in the meantime, with zero installs you can include all dependencies and there’s no need to run yarn install anymore.

yarnpkg / berry

[Docs] Explain PnP a bit more #850