microsoft / rushstack

Monorepo for tools developed by the Rush Stack community
https://rushstack.io/
Other
5.94k stars 598 forks source link

[rush] Provide a lightweight "rush deploy" mode that does not require any symlinks #2045

Open dpolivy opened 4 years ago

dpolivy commented 4 years ago

Please prefix the issue title with the project name i.e. [rush], [api-extractor] etc.

Is this a feature or a bug?

Please describe the actual behavior.

rush deploy is great, however it has a fundamental incompatibility with deployments that don't support symlinks, such as Azure App Service's Run from Package. In this scenario, zipping up the deployment directory creates copies of all the files referenced by symlinks, but it means that only the main dependencies of each app are able to be resolved -- any dependencies of those dependencies cannot be resolved, as that is dependent on resolving based on the symlink'ed location of the module. Therefore, apps that are packaged directly are unable to run without symlinks. Unfortunately, the Run from Package feature does not support TAR files (which do support symlinks).

What is the expected behavior?

I would expect rush deploy to be able to produce output that could be deployed without relying on the symlinks. One thought here is that the common\temp\node_modules\.pnpm\node_modules\ directory could be symlinked/located in the root, which might solve the problem, although in a ZIP of the deploy directory would lead to quite a bit of duplication and bloat. If there were a way to just have all of the required node modules for production (including local projects) stored in a flat structure in the root of the deploy folder, that would be ideal, as then there would be only a single copy of each module in the deployment.

If this is a bug, please provide the tool version, Node.js version, and OS.

octogonz commented 4 years ago

I would expect rush deploy to be able to produce output that could be deployed without relying on the symlinks.

This is technically impossible without changing to a fundamentally different installation strategy, and reintroducing the phantom dependency and doppleganger problems are solved by symlinks. The right way to approach this problem is to figure out how to get Azure App Service to create the symlinks.

The rush deploy command already supports a "linkCreation": "script" setting that can defer the symlink creation until after the archive is unpacked. For example, a simple workaround might be for the application itself to invoke the create-links.js script automatically during initial startup.

But ideally we should figure out a best practice for Azure App Service and document it. We could also consider proposing an improvement for the Azure App Service feature to better support symlinks. (For example, the .zip files created by Rush's --create-archive parameter are already capable of storing symlinks, although many unzipping tools do not support that.)

@dpolivy Are you able to do some research into these options?

dpolivy commented 4 years ago

The challenge here is that the Run from Package deployment doesn't actually unpack the files. It mounts the ZIP directly to the filesystem and runs it from there. There are a few benefits of doing it this way which you can read about in the documentation. So there's no way to fix up the links after the fact with this approach. It might be possible if I switch back to the traditional deployment model, but that is not an ideal solution.

(For example, the .zip files created by Rush's --create-archive parameter are already capable of storing symlinks, although many unzipping tools do not support that.)

This is interesting, I did not realize that. I am currently using Azure DevOps ArchiveFile task, which uses 7zip v16, which does not preserve links. I'll take a look at this and see if it solves the problem, though. The real question is whether whatever tool Azure is using to mount the archive supports them as well. If I need additional directories included into the zip, is there any easy way to do so (can the deploy and --create-archive steps be run separately so I can modify the deploy directory contents first)?

This is technically impossible without changing to a fundamentally different installation strategy, and reintroducing the phantom dependency and doppleganger problems are solved by symlinks.

While I understand the philosophy here, when packaging for a production deployment, is it as important to keep these boundaries intact? If we solve the problems in the development scenario, would it be ok to build an optimized production "build artifact" that works without symlinks even if it potentially allowed for phantom dependencies?

I'm definitely interested in solving this problem so I can move on to other work, so happy to try and dig in where I can. As I'm new to rush and pnpm, etc, I'm not as familiar with the internals and nuances, so appreciate any guidance and assistance you can provide. Also, are there specific Microsoft folks associated with rush who might be able to assist and/or engage the internal App Service teams for discussion?

dpolivy commented 4 years ago

(For example, the .zip files created by Rush's --create-archive parameter are already capable of storing symlinks, although many unzipping tools do not support that.)

I was able to give this a try, and unfortunately, it seems that the links are not handled properly when the ZIP is mounted into App Service. The links just show up as files with the content being the target of the link. So seems like this approach isn't going to work ☹

octogonz commented 4 years ago

I see. You could probably workaround this for many cases by using webpack to bundle your app, so that there are no node_modules dependencies.

We could probably also come up with a way to use a different installation plan (e.g. run pnpm install --shamefully-flatten in the deployment folder), but that would be deploying something different from what you tested during development.

Fundamentally however, these limitations of the "Run from Package" feature seem maybe too restrictive for a professional deployment strategy that we could generally support. Do they have any other options you could use instead?

dpolivy commented 4 years ago

I'm haven't used webpack, so I'm not familiar with how it works on server-side code, especially when using modules with native node components (sharp, edge.js).

I wasn't aware of shamefully-hoist (the new name for shamefully-flatten) as it's not linked in the pnpm install doc page. It might help, but not if it's only putting the dependencies in common/temp/node_modules instead of an actual root node_modules or the app node_modules directories.

Would using regular npm be a better option, since it doesn't necessarily need to symlink the same way pnpm does? I guess I don't really understand why building a production release that is the equivalent of an npm ci flat node_modules install is such a bad thing? I get that there are issues in "correctness", but if those are maintained throughout development, isn't it safe to structure things a little differently in production (that seems to be the rationale behind shamefully-hoist) if the deployment model doesn't support symlinks? I don't use containers, but I believe there are some other scenarios where symlinks are not desired.

Fundamentally however, these limitations of the "Run from Package" feature seem maybe too restrictive for a professional deployment strategy that we could generally support. Do they have any other options you could use instead?

Yes, there is also Deploy from ZIP. This would potentially offer the ability to fix the symlinks after the files are unzipped in the source. We used to use this deployment method, but found it would take 30-40 minutes to complete given the number of files we had in node_modules. When we switched to Run from Package, the deployment was complete within 2 minutes. So a pretty significant time savings for us, in addition to the other benefits of running from package (see docs):

- Eliminates file lock conflicts between deployment and runtime.
- Ensures only full-deployed apps are running at any time.
- Can be deployed to a production app (with restart).
- Improves the performance of Azure Resource Manager deployments.
- May reduce cold-start times, particularly for JavaScript functions with large npm package trees.

I'm willing to revisit this for the sake of experimentation, but my desired goal would be to run from package.

There is also one other similar scenario, whereby an app would need to have a fully self-contained node_modules directory under it -- Azure WebJobs. When run, the app directory itself it copied to a random directory in a temporary (local) filesystem on the machine running the job, and therefore the app directory must itself contain all dependent modules (and their dependencies). If this is a separate task, I don't want to conflate that here, but it's another problem I need to address.

octogonz commented 4 years ago

I'm haven't used webpack, so I'm not familiar with how it works on server-side code, especially when using modules with native node components (sharp, edge.js).

Some Node.js tools are packed into a single bundle. For example, yarn comes as a single .js file. But it will run into trouble with native dependencies and any libraries that probes around in the node_modules folder without using require(). So Webpack is an approach that only works in certain well-behaved cases.

Would using regular npm be a better option, since it doesn't necessarily need to symlink the same way pnpm does?

pnpm --shamefully-hoist is functionally equivalent to NPM. If we build this into Rush, I'd approach it using PNPM so we can use pnpmfile.js and other PNPM-specific features that will make this easier.

I get that there are issues in "correctness", but if those are maintained throughout development, isn't it safe to structure things a little differently in production (that seems to be the rationale behind shamefully-hoist) if the deployment model doesn't support symlinks?

Read about doppelgangers for example. When these problems arise, they don't have easy solutions. For small installation scenarios (which may very well include deployments), if those problems don't arise, then everything just works fine and nobody understands what we're talking about regarding "correctness". :-) Whereas if you eventually do encounter those problems, they can be thorny.

We used to use this deployment method, but found it would take 30-40 minutes to complete given the number of files we had in node_modules. When we switched to Run from Package, the deployment was complete within 2 minutes. So a pretty significant time savings for us

The 30-40 mins time makes sense if you zipped up the entire monorepo installation footprint. However rush deploy is supposed to carve out a relatively small subset of files needed by the deployed app (excluding devDependencies in particular). I'm curious to hear if this timing is much smaller for a zip file created by rush deploy.

octogonz commented 4 years ago

Based on our chat, I think you've provided reasonable technical motivation for us to consider a rush deploy mode that works like npm ci.

Suggested Spec

The resulting common/deploy output should be equivalent to something like this:

  1. Build all the projects in your branch and publish them to a temporary NPM registry
  2. Copy the deployed app into common/deploy
  3. Run pnpm install --shamefully-hoist to install all the dependencies (from the temporary NPM registry), using the older node_modules model that avoids symlinks

We wouldn't actually implement it using a temporary NPM registry -- the above is just a behavioral spec.

Suggested Implementation

But our solution could be close to that... 🤔 For example, maybe we could create a temporary pnpmfile.js that redirects PNPM to look in local folders instead of the NPM registry, for local Rush projects.

As mentioned before, this mode has some downsides:

Thus we would NOT make this the default or recommended mode. It would be an optional mode for scenarios like yours.

octogonz commented 4 years ago

How's that sound? The next step would be to fiddle around with pnpmfile.js manually, and see if we can produce a common/deploy folder that way. If it works, then we could look at making this a Rush feature.

dpolivy commented 4 years ago

I have a little good news/bad news to report.

The 30-40 mins time makes sense if you zipped up the entire monorepo installation footprint. However rush deploy is supposed to carve out a relatively small subset of files needed by the deployed app (excluding devDependencies in particular). I'm curious to hear if this timing is much smaller for a zip file created by rush deploy.

Good news: with the modified rush deploy output (including my legacy non-Node app files), the ZIP Deploy method seems to take about 10-12 minutes, at least on the very small sample size of 2 runs I've done so far. So that's a nice improvement. However, ultimately, I am unable to create the symlinks after the files have been copied. Upon some further research, it seems symlinks are not supported in Azure App Service, at all.

Ref: #1, #2, #3

Sadly, it seems this is just not at all possible with Azure Web Apps. Some of the answers reference the ability to follow an existing symlink, but it's unclear to me if that is just a system-generated one or if there is any path to creating one by a site owner. I've tried using the archive generated by rush deploy --create-archive, but it just results in files with targets as the text instead of actual links. I will look into opening a support ticket to see if I can glean more information.

I'll also start fiddling around as you suggest above to see if I can hack something together as a PoC.

octogonz commented 4 years ago

I'll also start fiddling around as you suggest above to see if I can hack something together as a PoC.

I would suggest to set "useWorkspaces": true in your rush.json file, since that will become the default installation model in the next major release of Rush. And the pnpmfile.js fixups will be somewhat different (and actually easier) in that model.

octogonz commented 4 years ago

@dpolivy I experimented with this idea a bit myself. I was able to get PNPM to remap the workspace: specifier using a pnpmfile.js like this (testing with the rush-example monorepo with "useWorkspaces": true):

'use strict';

module.exports = {
  hooks: {
    readPackage
  }
};

/**
 * This hook is invoked during installation before a package's dependencies
 * are selected.
 * The `packageJson` parameter is the deserialized package.json
 * contents for the package that is about to be installed.
 * The `context` parameter provides a log() function.
 * The return value is the updated object.
 */
function readPackage(packageJson, context) {
  console.log('TRACE: ' + packageJson.name);

  function fixup(dependencyTable) {
    if (!dependencyTable) {
      return;
    }

    for (const dependencyName of Object.keys(dependencyTable)) {
      const versionSpecifier = dependencyTable[dependencyName];
      if (/^workspace:/.test(versionSpecifier)) {
        debugger;
        let newSpecifier = '';
        switch (dependencyName) {
          case 'my-controls':
            newSpecifier = 'file:../../libraries/my-controls/';
            break;
          case 'my-toolchain':
            newSpecifier = 'file:../../tools/my-toolchain/';
            break;
          default:
            throw new Error('Unknown workspace reference to "' + dependencyName + '" for "'
              +  packageJson.name + '"');
        }
        dependencyTable[dependencyName] = newSpecifier;
      }
    }
  }

  fixup(packageJson.dependencies);
  fixup(packageJson.devDependencies);
  fixup(packageJson.optionalDependencies);
  fixup(packageJson.peerDependencies);

  return packageJson;
}

And I used this command line for installing:

pnpm install --prod --shamefully-hoist --package-import-method=copy --no-lockfile --prefer-offline

However, even if with --package-import-method=copy, PNPM still seems to create symlinks in the node_modules folder.

@zkochan is there any way to make PNPM install without symlinks, i.e. the installation model used by Yarn classic and NPM? If not, we might need to use Yarn for this.

zkochan commented 4 years ago

@zkochan is there any way to make PNPM install without symlinks, i.e. the installation model used by Yarn classic and NPM? If not, we might need to use Yarn for this.

no, the whole point of pnpm is its unique node_modules structure that is made possible by symlinks. So we only support symlinks. We will never support a flat node_modules without symlinks. We might support Yarn's Plug'n'Play, which doesn't require symlinks because it overrides Node's resolution algorithm.

Would be nice if Node supported something like "fake symlinks". pnpm would create just some text files instead of a symlink and Node would use them to resolve the real location's of packages. Maybe we can create an issue at NodeJS.

octogonz commented 4 years ago

@zkochan I am trying to understand two things here:

Thanks!

zkochan commented 4 years ago

why does --shamefully-hoist require symlinks at all -- isn't it essentially reproducing NPM's algorithm?

it is not reproducing npm's algorithm. It is reproducing npm's flat node_modules, using symlinks.

what is the purpose of --package-import-method=copy? The docs make it sound like it avoids creating symlinks

it has nothing to do with symlinks. It uses copying instead of hard linking.

After giving it more thought. We don't even need changes in NodeJS. We may try overriding the implementation of fs.readlink to make it understand the "fake symlinks".

octogonz commented 4 years ago

Would be nice if Node supported something like "fake symlinks". pnpm would create just some text files instead of a symlink and Node would use them to resolve the real location's of packages. Maybe we can create an issue at NodeJS.

@zkochan This is a fascinating idea. However it seems to require hooking every core API that interacts with file paths, not just require(). For example, any of fs.copyFile(), child_process.exec(), etc might be invoked to open a path that passes through a virtual symlink. These APIs do not call fs.readlink() internally, but instead wrap core OS APIs that internally traverse the filesystem. Also if Node.js spawns child processes, then the monkey patch would somehow need to be enabled for them as well.

dpolivy commented 4 years ago

@octogonz It seems there is a new @pnpm/make-dedicated-lockfile tool that can generate a lockfile for a specific subset of a workspace. That, combined with pnpm PnP should allow rush deploy to generate a deployable structure without symlinks for folks like me who need that. Do you think that rush deploy could be updated to support this approach?

See https://github.com/pnpm/pnpm/issues/2198#issuecomment-710882357 for more details.

octogonz commented 4 years ago

If PNPM provides the technology to solve this, certainly we would incorporate that into rush deploy.

I tried @pnpm/make-dedicated-lockfile but wasn't able to get it working. The CLI is a thin wrapper around this API:

make-dedicated-lockfile/src/index.ts

export default async function (lockfileDir: string, projectDir: string) {
  const lockfile = await readWantedLockfile(lockfileDir, { ignoreIncompatible: false })
  if (!lockfile) {
    throw new Error('no lockfile found')

I tried calling it in the Rush Stack repo with lockFileDir="<repo>/common/temp" and projectDir="<repo>/apps/api-extractor" and it deleted the node_modules folder and then printed this error:

> (node:8472) UnhandledPromiseRejectionWarning: Error: Cannot resolve workspace protocol of dependency "@microsoft/api-extractor-model" because this dependency is not installed. Try running "pnpm install".
    at makePublishDependency (C:\Users\Owner\AppData\Roaming\nvm\v12.18.4\node_modules\@pnpm\make-dedicated-lockfile\node_modules\@pnpm\exportable-manifest\lib\index.js:64:19)
    at async C:\Users\Owner\AppData\Roaming\nvm\v12.18.4\node_modules\@pnpm\make-dedicated-lockfile\node_modules\@pnpm\exportable-manifest\lib\index.js:53:9
    at async Promise.all (index 0)

There are no docs in any of this code, but it seems that maybe:

BTW @dpolivy you might also want to first verify that (1) your project actually works with Plug'n'Play -- many do not, and (2) your target runtime supports Plug'n'Play, for example some way to invoke .pnp.js before the app boots up.

octogonz commented 4 years ago

@dpolivy the .zip file format does have a spec for storing symlinks. So it might be worthwhile at least to create a ticket asking for Azure App Service Run from Package to support these symlinks when mounting .zip file.

dpolivy commented 4 years ago

@octogonz Thanks for giving it a shot. I'm not entirely sure how it's supposed to work, but maybe @zkochan can offer some suggestions on how to utilize it in this scenario?

And yes, I did test my app with PnP when I was using rush, and it seemed to work OK. It is possible on App Service to specify the command line for invoking your node app, which allows one to insert the parameter to get it to load the .pnp.js file. The challenge I had, which I think make-dedicated-lockfile is intended to solve, is that the .pnp.js I used originally was for the entire repo, when I'd much prefer it to be specific to each "deployed app" (project). As far as filing a feature request on App Service, I did pass that along but I'm not holding my breath waiting for it to happen...

octogonz commented 3 years ago

@hbo-iecheruo and I encountered this same problem today with AWS Lambda services. Unlike with Azure App Service's Run from Package, the .zip file gets extracted rather than being mounted as readonly disk volume. But there is no lifecycle step where symlinks can be created, so the requirements are exactly the same.

In https://github.com/pnpm/pnpm/issues/2198#issuecomment-710882357 the conclusion for PNPM was:

So to summarize.

  1. In order to deploy a project from a workspace use @pnpm/make-dedicated-lockfile

  2. If the environment that you are deploying to doesn't work with symlinks well, or it does not support symlinks, then use then use Plug'n'Play, which is shipped with pnpm v5.9. Create the next .npmrc in the root of your project:

    node-linker=pnp
    symlink=false

But we can add a few observations:

I'd like to propose that rush deploy should support a special symlink-free mode for these "lightweight" deployment scenarios. We could impose some simplifying restrictions, for example maybe deploymentProjectNames cannot specify multiple projects.

How would it work?

rush deploy could do a PNPM Plug'n'Play installation, even though the monorepo is not using Plug'n'Play.

Hypothetically, suppose you did these steps manually:

  1. If your target project depends on other local workspace projects, first publish their NPM packages to a private Verdaccio registry
  2. Then copy the target project into the common/deploy folder, along with the monorepo's pnpmfile.cjs
  3. Then you run pnpm install in that folder, doing a Plug'n'Play installation without any symlinks.
  4. Zip up the result

Step 3 could use npm install or yarn install equivalently, but choosing PNPM has the advantage of supporting PNPM-specific features such as pnpmfile.cjs.

The actual implementation would not really need Verdaccio. Instead it would simply rewire PNPM somehow to install directly from the local folders, producing the same outcome.

dpolivy commented 3 years ago

As you know, I fully support this 👍

I'd like to propose that rush deploy should support a special symlink-free mode for these "lightweight" deployment scenarios. We could impose some simplifying restrictions, for example maybe deploymentProjectNames cannot specify multiple projects.

One of my scenarios in a monorepo is that I have multiple Node.js apps that share common modules, but ultimately get packaged and deployed separately. And also some Node.js apps that get packaged and deployed together. If you do add this functionality, it would be great if these scenarios were both supported.

ghost commented 3 years ago

any updates here?

jimmythomson commented 2 years ago

I'm keen to use Rush on a new project, but am currently stuck if I can't create an asset without using symlinks as the application is being deployed as an AWS Lambda (we'll almost definitely hit the same scenario with Azure). I'm happy to test this as soon as anything is ready to go.

jamsch commented 2 years ago

No idea if this works with Rush, but pnpm 6.25.0 now has a new configuration option node-linker=hoisted which can be added to .npmrc.

Last time I tried using React Native & @rnx-kit/metro-resolver-symlinks with Rush there were still issues, but hopefully this may resolve it.

antoine-coulon commented 2 years ago

Any update about that subject?

Currently facing the same issues about WebApp deployments on Azure as the RUN_FROM_PACKAGE can only be done without symlinks, unfortunately.

Being able to provide a flat and ugly hoisted node_modules should do the job to bypass systems not supporting symlinks (putting aside all doppelgangers and drawbacks of not using symlinks that we'll have to assume at some point).

There might be a workaround using directly pnpm but it would be cool to have that option directly from Rush.

UROjQ6r80p commented 11 months ago

I was able to deploy aws lambda service from pnpm monorepo. https://github.com/UROjQ6r80p/pnpm-aws-monorepo/

Also another user mentioned that possibility here: https://github.com/pnpm/pnpm/issues/6259#issuecomment-1712158649 I did not find any information from AWS about that, nothing in here about symlinks: https://docs.aws.amazon.com/lambda/latest/dg/lambda-releases.html

https://pnpm.io/npmrc states: Some serverless providers (for instance, AWS Lambda) don't support symlinks

Did I overlook anything? Do I understand it was not possible before on AWS Lambda?

on Linux:

No node-linker=hoisted, default pnpm config used.

No unneccessary modules from other packages bloating your lambda.

image

Lambda:

image

on Windows (zip tools do not support symlinks on Windows)