google / wireit

Wireit upgrades your npm/pnpm/yarn scripts to make them smarter and more efficient.
Apache License 2.0
6.09k stars 108 forks source link

[Proposal] Dependency Files as Inputs With Symlinked Packages #944

Open ObliviousHarmony opened 1 year ago

ObliviousHarmony commented 1 year ago

Problem

Wireit does not provide a way to easily include the files of symlinked dependencies in fingerprinting. Unlike normal dependencies, changes to these will not be reflected in a lock file and will not change the fingerprint. This can be somewhat addressed by using the "dependencies" feature in Wireit, however, this is not always the case:

Proposed Solution

Provide a new dependencyFiles option that treats the files included in dependencies and transitive dependencies as fingerprinting inputs. By default this should include dependencies, devDependencies, and peerDependencies since changes to any of these might result in changes to outputs. This option should have these settings:

Performance Considerations

While it would be easy to simply include all files, in the case of a symlink, there will be many files that we have no business treating as inputs. There are some steps we can take to fingerprint the smallest number of files possible:

Benefits

Ultimately the goal here is to provide a way for wireit to be used with an individual package in isolation. By treating dependency files as inputs we can rely on wireit to handle caching and watching for a package regardless of whether or not its dependencies are using it too.

In my particular use-case, I am using pnpm and want to rely on --filter to provide a consistent syntax for developers to run commands in our monorepo. Using Wireit's "dependencies" means having various side-effects when using --filter on scripts that use it. Including dependency outputs as package inputs means that each one can be built independently and we can rely on our own tooling to handle hierarchical tasks.

This doesn't seem incompatible with the existing "dependencies" functionality either since downstream file changes would happen prior to the generation of a new fingerprint upstream. It will also play nicely with --watch since downstream changes would automatically cascade.

Alternatives Considered

Add node_modules to "files"

You can technically already get something kind of like this by pairing the "allowUsuallyExcludedPaths" setting with explicit node_modules globs. The caveat here is that you'll need to be very explicit in order to minimize unnecessary inputs. By relying on package.json files to read dependencies we will avoid the headaches and pitfalls associated with manually maintaining a list like this. You can avoid these pitfalls with a custom tool but I believe this feature makes more sense to implement directly in Wireit.

ObliviousHarmony commented 1 year ago

If this is something that would be a good fit for Wireit I wouldn't mind taking a crack at implementing it!

aomarks commented 1 year ago

So am I understanding correctly?

You have package A, which depends on package B. Package A uses wireit, but package B does not. You want A to re-run whenever B changes.

I think the first solution I would reach for in this situation is to just include the relevant files of B in the "files" array for A. To keep things organized, you could create a "files-only" target and use that as a dependency:

{
  "name": "a",
  "wireit": {
    "build": {
      "files": [
        "lib/**/*.js"
      ],
      "dependencies": [
        "b:files"
      ]
    },
    "b:files": {
      "files": [
        "../b/lib/**/*.js"
      ]
    }
  }
}

Would something like that be a good solution for your use case?

ObliviousHarmony commented 1 year ago

The case of a package using wireit while another does not was just another one that I noticed when I was writing this proposal. In my specific use-case I have a number of packages with this kind of config:

{
  "scripts": {
    "build": "pnpm --filter=\"...$npm_package_name\" run build:project",
    "build:project": "wireit"
  },
  "wireit": {
    "build:project": {
      "command": "tsc --project tsconfig.json",
      "files": [
        "tsconfig.json",
        "src/**/*.{js,jsx,ts,tsx}",
        "typings/**/*.ts"
      ],
      "output": [
        "dist/**"
      ]
    }
  }
}

When I run pnpm build for this package it will run build in all of the dependent packages first (with concurrency) and then the top-level package. wireit will make sure that any changed packages are updated, but without additional configuration, these changes won't cascade. Although all of these packages do use wireit, if we used the "dependencies" feature, the build script would double-cascade when using --filter.

Would something like that be a good solution for your use case?

This is an interesting solution and I think it would work if also paired with one for https://github.com/google/wireit/issues/23 that automatically reads workspace dependencies. Assuming that each dependency had a wireit-only "files" script like this, a configuration that depended on all of its workspace "files" would have the same result. This could then be depended on by the "build" script like in your example and cascade file changes but not actual scripts.

ObliviousHarmony commented 1 year ago

For the use-case of a non-wireit dependency, however, you'd also need to define all of the transitive dependencies too and honestly it sounds like a bit of a mess. Maintaining lists of dependencies by hand separately from the typical NPM dependencies seems a little error-prone and likely to result in accidental drift.

ObliviousHarmony commented 1 year ago

One thing I do like about this proposal though is that it feels more intuitive. Package lock files are included in the fingerprint because we are acknowledging that changes in dependencies should invalidate the cache. Symlinked dependencies aren't included in that fingerprint but should also invalidate the cache. Since only the output files have any bearing on the current package, including those in the fingerprint just seems natural.

ObliviousHarmony commented 1 year ago

I implemented this in our monorepo using a.pnpmfile.cjs hook file. It grabs all of the workspace dependencies for each project and constructs a config that looks like this:

"wireit": {
    "dependencyOutputs": {
        "allowUsuallyExcludedPaths": true,
        "files": [
            "node_modules/@woocommerce/eslint-plugin/configs",
            "node_modules/@woocommerce/eslint-plugin/rules",
            "node_modules/@woocommerce/eslint-plugin/index.js",
            "package.json"
        ]
    }
}

It's definitely not as pretty as a native solution in Wireit could be but It does feel intuitive when paired with a different solution for task orchestration. We can use Wireit alongside PNPM's --filter option without having to worry about anything and it supports Wireit's full feature set. This ends up working out better in that we don't have to try and get folks to remember some commands can use --filter while others cannot.