Feature Request: mechanism to "vendor" a workspace module's node_modules

bryanlarsen commented 7 years ago

Workspaces are awesome, but they're inconvenient for deployment.

I presume that the intention of workspaces is that for deployment, you're supposed to push all your packages to an NPM registry, and then yarn install inside a copy of your module outside the workspace will work.

However, there are a bunch of reasons you might not want to push. If it's an open source module, then bumping the version releases the module to the wild. But often you want to test in a production staging environment before releasing the module. If it's a private module then you have to worry about authentication tokens and/or private registries.

The other issue is that there's no yarn.lock for the modules, only a single global yarn.lock.

To elaborate further, take the example from the blog post, jest. And let's assume that jest-diff is an app that can be deployed to a server.

Suppose I download jest-diff via npm pack. I can run yarn install inside of that package, but I'll be installing without the benefit of a yarn.lock, and I need to have access to jest-get-type. Easy enough since it's been published, but publishing has the drawbacks listed above.

What I'd like to be able to do, is in a fresh checkout of jest, cd packages/jest-diff, and then yarn install --vendored. This would then fully populate packages/jest-diff/node_modules. It would copy packages/jest-get-type into packages/jest-diff/node_modules/jest-get-type. It would not create a root node_modules nor a packages/jest-get-type/node_modules, although of course it would create packages/jest-diff/node_modules/jest-get-type/node_modules if it was necessary. It would use the yarn.lock from the repo root for resolutions.

I could then take a tarball of the packages/jest-diff directory and put it on a server to run it there, or run a docker build from the packages/jest-diff directory without having to send my entire workspace as a docker context. I could also check in packages/jest-diff as a deployment branch into git to make sure that subsequent deployments are completely deterministic and don't require a network.

idris commented 7 years ago

There's one issue here: Ideally you want the yarn install to happen in the environment where you are deploying. For example, if you're using Docker you want the yarn install to happen inside of your Dockerfile, rather than running it locally and using a tarball in your Dockerfile.

So really what you want is something like yarn flatten-workspace-dependencies which will build all of your intra-workspace dependencies the same way they'd get built for a npm publish (including package.json and transpiled (with Babel, etc) JS files, but NOT node_modules).

The result of that operation should be that the output directory doesn't reference any files outside of the current project directory. Now, you can copy that to a Docker container and run yarn install normally.

bryanlarsen commented 7 years ago

@idris, I believe that's different than my original request. In my original request I wanted symlinks to the parent workspace node_modules folder, in yours you probably want a copy. Unless I misunderstand you. Perhaps it's better to open a new issue and link back to this one.

tamagokun commented 7 years ago

Currently looking to do this same thing.

I have a handful of tiny modules that I want to deploy separately via AWS lambda. By using yarn workspaces, I can create shared local modules that other modules can depend on. This works out really well, until it comes time to deploy anything.

Lambda expects you to upload your entire project with all dependencies as a zip file. There's really no way to accomplish this with workspaces unless I opt to upload the entire node_modules folder from workspace root. It would be great to have some kind of yarn pack --withDependencies or yarn install --vendored where it will grab the dependencies of the specific workspace module from the root node_modules folder, and copy it over into the module's node_modules folder. This makes for really easy deployment.

meysholdt commented 6 years ago

I ran into the same challenge: deploy multiple packages from a yarn workspace into multiple docker images with the packages having dependencies on one-another. I would very much like to not use a npm registry because I'm running this in a Jenkins multibrach build: Packages are being build for each git branch and thus there are many builds of the same package in the same version. This doesn't play well with pushing them to a singleton registry.

connectdotz commented 6 years ago

@bryanlarsen one of the main reasons to use monorepo is that you can make changes across your packages without requiring them to be published, provided you have the whole workspaces checkout.

If you primarily work off a particular package, like in your example use case, maybe consider not use monorepo and just set them up as separate standalone packages. You can then use yarn link to test them from your local repo before publishing.

If you want to collect all your dependencies into a self-sufficient pack for deployment, one would think that is what many existing bundlers are for, such as webpack/browserify/metro. Bundling is not trivial, locating modules is hardly the only challenge you will face. Scanning yarn.lock or traversing the node_modules yourself might not be the shortest route...

If your dependent library made a monorepo-incompatible assumption of where the module/artifact are, you should probably file an issue to have them fix it. If they could not fix it right away, you can use the new feature nohoist to exclude yarn from hoisting the particular modules to the root.

I didn't quite get @meysholdt docker use case, maybe an example repo would make it more clear...

meysholdt commented 6 years ago

let's say I have a monorepo:

root
|-node_modules
|-package1
|  |-node_modules
|  |-package.json
|  |-Dockerfile
|-package2
|  |-node_modules
|  |-package.json
|  |-Dockerfile

With both Dockerfiles doing something like this:

COPY ./ /myapp/

to copy all contents from folder package1 (or package2) into the docker image. I was hoping this would allow me to create a runnable Docker image. However, due to hoisting, yarn installs many dependencies into root/node_modules. I could also include them into the docker image, but this seems unclean since I think root/node_modules includes dependencies from other packages.

The nohoist-feature you (@connectdotz ) mention seems to go to the right direction, but what I was looking for to disable hoisting for all dependencies and thus not to have the root/node_modules at all.

I'm aware this doesn't follow the advice from @idris:

There's one issue here: Ideally you want the yarn install to happen in the environment where you are deploying. For example, if you're using Docker you want the yarn install to happen inside of your Dockerfile, rather than running it locally and using a tarball in your Dockerfile.

But for that to work the missing piece is a way to make dependencies from the same workspace available to the docker engine: In this example, let's assume package1 depends on package2. When running yarn install as part of docker build (i.e. inside the Dockerfile) for package1, the dependency to package2 can't be resolved unless you push package2 to a npm repo first.

connectdotz commented 6 years ago

ok, so looks like there is a common thread emerging from these discussions (including #5428): to be able to publish or deploy individual packages with their dependencies resolved.

As I mentioned earlier, this is the area of package bundling, which is not trivial and already have many existing solutions. But what is the real experience? How does it really work in workspaces and how painful is it? I decided to give it a try, you can see the sample repo here: workspaces-webpack-docker

there are 2 docker containers there:

one is for the whole repo, thus built from source with yarn, just like any normal monorepo development env;
the other is built from an individual package (w2) bundle, created by webpack before docker build. This bundle is self-sufficient thus can be deployed to docker, lambda or your client's desktop.

The webpack config is pretty trivial, the Dockerfiles are pretty simple, both with huge community support for more. This is just a quick throw-together sample, I am sure there is much room for improvement/optimization, but overall I am happy with the ease of use.

Also realized that even with local node_modules all populated or package-level yarn.lock file, I still can't run my app without more tooling like transpiling es6, which is already handled by webpack+babel with many useful examples to work with.

To conclude: for most use cases, I think it is better off to just use the package bundlers, like webpack, to bundle your package if you need to deploy it as a standalone unit for testing or production use. IMHO, it probably doesn't make sense for yarn to venture into this area. @evocateur, lerna owner, stated similar opinion. As this simple experiment showed that module resolution is hardly the only thing needed to be done in deploying individual, self-sufficient packages... It is best to leave the bundling to the bundler...

bryanlarsen commented 6 years ago

@connectdotz,

So what we're basically doing right now is (basically) shipping the whole monorepo and it's deps for every component. That docker image is now 6GB. :) It's sort of like deploying your primary container. It works, but eventually it won't scale.

We're now almost completely switched over to node 8.5+ with --experimental-modules, which means no more transpiling. Yay! I'd need a damn good reason to bring it back.

The other big problem is code sharing with clients. Each client directory is a git subrepo inside of our monorepo. So far we've only had one customer who actually tried building the code, but it should be theoretically possible for all of them.

inversion commented 6 years ago

We're also following @bryanlarsen 's approach - several of our dependencies don't work with webpack (due to dynamic imports which can't be resolved at build time and/or legacy structuring) and we'd like to avoid transpilation on the server.

What would be ideal for us would be the ability to find out which dependencies are needed for a particular workspace package . Currently yarn list just returns all the dependencies for the whole monorepo even if run from a workspace package directory (#5174). Given that information we could deploy the monorepo structure, but with any unneeded dependencies (including packages) pruned out.

connectdotz commented 6 years ago

@bryanlarsen, if your monorepo is too big and you only need a single package anyway, why not just bundle that package then deploy it to docker, similar to publish only a single package? you can certainly run webpack without transpiler. Did I miss something?

@inversion, your issue seems to be with webpack, you might be better off to resolve that, or explore deploying full repo, private registry, among others... even if you can get a flat package node_modules like you have suggested, keep in mind that it will contain all modules: used, unused, devDependencies etc. If you already worry about the repo being too big (I assume you do otherwise why not just take the whole repo to docker), wouldn't you want to only include the ones that actually used at runtime? How about minification, that could save your footprint by half sometimes... Optimization like these, among others, is what a good bundler aims to offer, where I don't see yarn will ever venture into.

Hopefully, we all realized by now that a flat module tree is probably not the only challenge for these use cases. Not that I think webpack is the perfect solution, but it does offer useful features you would need, with a fraction of the time, thus you should maybe consider it as a preferred option.

Having said that, I understand we all sometimes have to work around problems, even with less than ideal solutions... While nohoist is not designed to solve publish/deploy, you can indeed use it to stop hoisting the whole package if you so desire:

// in root package.json

"workspaces": {
    "packages": ["packages/*"],
    "nohoist": ["w2/**"]
  }

This will put every dependency of w2 in w2/node_modules. Note, it will not generate package-level yarn.lock, and the linked modules/packages (such as w1, utils) will remain to be symlinks under w2/node_modules. Good luck.

netanelgilad commented 6 years ago

@connectdotz So here is my concern with bundling as a solution. Let's say in your example project that w1 is a node module that has some data file as part of the package, that is read using fs.readFile. Now when that module is installed as a regular file, the data file is installed along with the package, and the package can depend on the file being there when it does something like fs.readFile(path.resolve(__dirname, "./data.file")). But once w2 is bundling w1 along with it, with webpack for example, it would bundle only the source files, missing the data file. Now it becomes a concern of w1 to know that it may be consumed through a bundler and change it's behavior accordingly, or a concern of w2 to know how to properly bundle w1.

I know this example may be contrived a bit, but it feels like a smell to me. I want w1 to work as a library without knowing that it's part of a bigger monorepo, and I want w2 to work like a microservice node package without knowing it's part of a bigger monorepo. And I want to use the workspaces feature to easily develop them along side each other without having to yarn link on every developer machine, etc...

Am I making sense? 🤔

tamagokun commented 6 years ago

I've managed to hack together a tool that fulfills my needs. It uses npm-remote-ls to resolve dependencies, which isn't great, but it works for me, so I thought i'd publish it for others to use:

https://github.com/ripeworks/workspace-pack

You specify the name of the folder of your local package, it will resolve all the local and remote deps, then jam it all into a .zip file for you.

Maybe this can become a thing? Oh, also, you'll probably need a recent node.js version to run it, since i'm not transpiling it (yet)

connectdotz commented 6 years ago

I know this example may be contrived a bit, but it feels like a smell to me. I want w1 to work as a library without knowing that it's part of a bigger monorepo, and I want w2 to work like a microservice node package without knowing it's part of a bigger monorepo.

@netanelgilad yes you are absolutely right. Having non-code assets in a js library is a well-known portability killer, with or without workspaces. That's why it is an anti-pattern that one should try to avoid. If this is really your use case, maybe consider fixing you asset access first; or maybe you just want to point out sometimes there are issues with bundling, which I completely agree, but that doesn't necessarily mean that yarn should just step in and fix them...

Let's get back to your actual use case, from your earlier comment:

So for us, we don't want to package the whole monorepo into the resulting docker container. We are using docker in production and those images should be as light as possible...So when we package a microservice, we want the image to contain the files of that microservice and any other dependencies as proper dependencies - downloaded from our private registry, and built for the arch of the docker image.

For production use, all your packages should have already been published to the private registry, right? Couldn't you just deploy the microservice package (without any dependency) to docker then do a clean yarn install from there?

meysholdt commented 6 years ago

For production use, all your packages should have already been published to the private registry, right?

not when building feature branches and deploying them to testing environments. In this scenario I don't want to deploy the packages to a central private npm repository because that would mix the packages from all feature branches. And launching a dedicated npm registry for every feature branch seems cumbersome.

netanelgilad commented 6 years ago

@netanelgilad yes you are absolutely right. Having non-code assets in a js library is a well-known portability killer, with or without workspaces. That's why it is an anti-pattern that one should try to avoid. If this is really your use case, maybe consider fixing you asset access first; or maybe you just want to point out sometimes there are issues with bundling, which I completely agree, but that doesn't necessarily mean that yarn should just step in and fix them...

I agree with you that yarn shouldn't have to solve problems related to non-code assets. My example was mainly directed at the issue of one package having to bundle another package (and the problems that come with it) instead of properly installing the package (which is why we use package managers for production installations, and not just bundle everything into an exe like file). But you are right, this isn't really my use case, just a smell :)

For production use, all your packages should have already been published to the private registry, right? Couldn't you just deploy the microservice package (without any dependency) to docker then do a clean yarn install from there?

So here is the crux of the problem. I want to run yarn install inside the container after I deploy the microservice package to docker, but I want to know that I'm getting the same dependencies I developed & tested the microservice with. Which means I need a yarn.lock file. And that raises the need for a yarn.lock per workspace (again, this doesn't necessarily need to be a yarn.lock per workspace solution, but I need to be able to get/generate one so I can copy it into the docker image alongside my microservice package). I think someone mentioned that using the yarn.lock of the whole monorepo would also work, it would just mean that yarn will ignore all the entries in the yarn.lock file that do not correlate to the package.json. If that is the case, I could work with that as a solution (though it feels a bit workaround-y).

wenisman commented 6 years ago

@netanelgilad you can run a yarn workspace [package name] generate-lock-entry > yarn.lock but you need to do this in your package directory.

Then you need to publish this, and that means committing and bumping npm versions on your ci-cd pipeline.

netanelgilad commented 6 years ago

Thanks for the advice @wenisman, but I gave it a try and the generated lock entry doesn't contain all the exact versions of the dependencies of the workspace. It just mentioned the workspace and the ranges of the dependencies already found in the package.json. Maybe I missed something?

But I did try the option of just using the yarn.lock of the whole monorepo and it seems to work fine (haven't vigorously tested it, but on the face of it, it looks good). It installed only the dependencies that were required by the specific workspace's package.json and with the correct version. So I guess that could be my solution for now.

wenisman commented 6 years ago

@netanelgilad I think there is a big raised that the generate lock doesn't produce the same result as the yarn install. However it's good to see that the workspace yarn.lock is working for you.

We are struggling with the same problem, we have a mono repo and simply wish to create some docker images of some of our components. The individual libraries are fine to publish to npm but our microservices we want to deploy. So we have copied the code, remove the node_modules of the service and then have yarn install inside a docker image.

It's a bit of work and so this vendoring of packages would help in this regard

fubhy commented 6 years ago

We've got the same problem. The --focus option that was introduced lately does not currently solve the problem adequately for us because some of our local dependencies are private npm modules that we don't want to publish on npm. And even if we did that, we would still not want to publish them for feature branches or pull requests for instance.

The problem would be solved for us by allowing the --focus option to copy local packages from the local filesystem instead of attempting to retrieve them from npm.

vjleblanc commented 6 years ago

We also have the same problem. Particularly for packaging AWS Lambda functions. The workspace feature has simplified our process dramatically, but this remains our last issue. Our current solution involves using a local verdaccio container running w/ Docker. We target it to stage local packages during the build process. This works well, but is a lot of overhead.

I would also love to see an option for --focus or another mode --vendor (maybe?) that would allow workspace siblings to be installed from the local file system.

As @idris pointed out there is a pitfall to this approach. Particularly if you are installing libraries that have native extensions and your local environment differs from your deployment environment. We have solved this problem by running the installation inside of a Docker container that mimics the deployment environment.

fubhy commented 6 years ago

That's a good, although hefty workaround @vjleblanc. I might consider that for us too in the meantime. Thanks!

The pitfall that @idris mentioned does also not apply to us because we build and test the in multi stage docker builds running in the exact same configuration as the final production containers that inherit the "build artifacts" from the previous build stages within the CI and are then deployed on dockerhub with all the native extensions functioning properly.

sgronblo commented 6 years ago

I don't suppose anyone is actively working on this? I am also desperately trying to figure out how to do something like yarn install --focus --production --modules-folder dist/node_modules for creating a Lambda deploy zip.

acmcelwee commented 6 years ago

@sgronblo, FWIW, I've found webpack to be the current best way to package code for lambda deployment zips. The tree shaking and minification that you get becomes essential when trying to stay under the 50MB zip size limit, once you have a couple dependencies involved.

sgronblo commented 6 years ago

@acmcelwee Thanks. I ended up setting up webpack and I think it seems to solve the problem as expected, even when using yarn workspaces. It's just a shame the output bundle is kind of a mess to look at.

vjleblanc commented 6 years ago

I tried the webpack approach this past week. I am pretty pleased with it. It decreased the size of lambda packages significantly (10x in my test cases). For those concerned about the readability of the packed code, I've found that setting the mode to none allows for a non-minified, though still very large, human readable file. Definitely going to be looking at it more.

creatyvtype commented 5 years ago

Definitely need this!!!!

meysholdt commented 5 years ago

I solution that works for me is:

configure a yarn-offline-mirror
run yarn install on your yarn-workspace. This will store all your dependencies in the yarn-offline-mirror.
run yarn build && yarn pack on every package in your workspace. This will build the package and store it into a .tgz file.
run yarn generate-lock-entry for every .tgz file and append the lock-entry to your yarn.lock
move the .tgz files into the yarn-offline-mirror.

now, to create a clean install of one of your packages:

create a package.json that has a dependency only on the needed package and nothing else
run yarn install --offline --prod --frozen-lockfile on that package.json using your enhanced yarn.lock and your yarn-offline-mirror.

Enjoy your installation without devDependencies and without storing packages in npm-registries. This works great for git-multibranch-builds.

gytisgreitai commented 5 years ago

How do you solve package conflicts if using webpack? E.g. I have a package that requires mime@1.4.1 and another one requires mime@^2.0.3. Their interfaces are incompatible, and webpack picks up the 2.0.3 which breaks the package that wants 1.4.1 ?

migueloller commented 5 years ago

@connectdotz,

While bundling before copying to Docker might seem like it would solve issues there's the problem of native dependencies that have a post install build step that relies on the current environment. In addition to that, features like babel-preset-env's selective transpiling will only transpile what's necessary for the current runtime. For this reason, running the entire build process in the Docker image itself is desireable.

The other issue is that of symlinks. Commands like COPY in a Dockerfile will not follow symlinks, for good reason. This means that unless there's a bundling step (or a manual de-linking and replacing with copies, which is what we have to do right now) it isn't possible to just copy no-hoisted node_modules in each package.

The suggestion to have a --vendor flag that generates a local lockfile based on the global one (to keep versions pinned) and replaces symlinks by copies would work perfectly. From there, a Dockerfile could COPY all of the package vendored dependencies and run transpilation and bunding steps in the Dockerfile, completely separate from the Yarn workflow.

EDIT: One more thing. While the lockfile from the root of the repository could be copied, that would invalidate Docker cached layers when dependencies of other packages in the repository change.

derN3rd commented 4 months ago

Hey everyone,

Did someone manage to find a workaround or solution for this already?

We a medium sized mono repo with 14 packages, where 5 of them are entry points as a node app. Currently we have huge Dockerfiles to manually copy over all of the needed folders and files from each module into the build step, which is quite messy to update if we add new packages to our repo.

I would be thankful for every hint as well as suggestions for alternatives, as we also cannot use webpack (due to complicated imports in our TS projects).

yarnpkg / yarn

Feature Request: mechanism to "vendor" a workspace module's node_modules #4521