Proposal: changes to output and fingerprinting

aomarks commented 2 years ago

This is a draft proposal for a set of changes that together provide a similar but slightly different solution to the problems that https://github.com/google/wireit/pull/238 is trying to solve. Hopefully in a more intuitive way.

Output propagation

Automatically treat the output files of a dependency as the input files of the dependent. This was sort of the case before, but only implicitly because fingerprints were always inherited.

Fingerprint inheritance

Stop automatically inheriting fingerprints through dependencies, unless the dependency has undefined output files.

This means that scripts will less frequently re-run unnecessarily, because if a script re-runs but didn't produce any different output files, then the scripts that depend on it won't always also need to re-run (e.g. imagine changing a strictness flag in your tsconfig.json that didn't affect the .js or .d.ts output files).

This does make it even more important to correctly specify your output files. One upside, though, was that previously if you missed some output files, you might only notice the problem after a cache restore. But now, an incorrect output configuration could become obvious sooner, because you would notice that a depending script didn't re-run.

Refining relevant output

Sometimes you have a dependency, but only consume a subset of the files that the script produces, or even none at all. E.g. tsc only needs the .d.ts files of its dependencies, while rollup only needs the .js files, but both are produced by the same script. A new outputInvalidates: false (*name pending) flag can be used to prevent the outputs of the dependency from automatically becoming the inputs of the dependent. When this is set, it is then your responsibility to specify exactly which subset of the input files are relevant in the files array.

Output slices

A pattern that may prove to be useful, and a best practice we can recommend, is to define "output slices", which are scripts that live next to a producer of some output, and provide a name for some subset of that producer's output.

This could provide a convenient way for consumers to reference slices of output, and also encourages users to keep closely-related glob patterns together, instead of being spread out across the consumer configurations.

Example:

{
  "wireit": {
    "tsc": {
      "command": "tsc",
      "files": ["tsconfig.json", "src/**/*.ts"],
      "dependencies": [
        "../other-package:tsc!dts"
      ],
      "output": ["lib/**", ".tsbuildinfo"]
    },
    "tsc!js": {
      "dependencies": {
        {
          "script": "tsc",
          "outputInvalidates": false
        }
      },
      "files": ["lib/**/*.js"]
    },
    "tsc!dts": {
      "dependencies": {
        {
          "script": "tsc",
          "outputInvalidates": false
        }
      },
      "files": ["lib/**/*.dts"]
    },
    "bundle": {
      "command": "rollup",
      "files": ["rollup.config.json"],
      "output": ["dist/bundle.js"],
      "dependencies": [
        "tsc!js",
        "../other-package:tsc!js"
      ]
    }
  }
}

Output slice sugar

We could potentially formalize the above pattern, and allow naming slices of output directly in the output configuration:

{
  "wireit": {
    "tsc": {
      "command": "tsc",
      "files": ["tsconfig.json", "src/**/*.ts"],
      "dependencies": [
        "../other-package:tsc!dts"
      ],
      "output": {
        "js": [
          "lib/**/*.js",
        ],
        "dts": [
          "lib/**/*.d.ts",
        ],
        "buildinfo": [
          ".tsbuildinfo"
        ]
      }
    },
    "bundle": {
      "command": "rollup",
      "files": ["rollup.config.json"],
      "output": ["dist/bundle.js"],
      "dependencies": [
        "tsc!js",
        "../other-package:tsc!js"
      ]
    }
  }
}

Server example

Another use case to think about is a web server. It depends on an actual implementation, and also some assets that it serves. If the implementation changes, then the server must be restarted. But if the assets change, it does not need to be restarted (because it reads the assets from disk on every request).

Here's how you might write that:

{
  "serve": {
    "command": "node my-server.js",
    "persistent": true,
    "dependencies": [
      "build:server",
      {
        "script": "build:site",
        "outputInvalidates": false
      }
    ]
  },
  "build:server": {
    "command": "tsc",
    "files": ["src/**/*.ts"],
    "output": ["lib/**"],
  },
  "build:site": {
    "command": "eleventy",
    "files": ["site/**"],
    "output": ["_site/**"]
  }
}

augustjk commented 2 years ago

I do like this approach. The syntax feels more complex but conceptually what it's trying to do makes more sense. While I like the sugar of the output slice syntax, it definitely feels like superuser territory. I'm assuming then specifying an output slice with a ! in the dependency implies outputInvalidates: false?

Regarding the name outputInvalidates, it doesn't feel as appropriate in the server example. More like outputTriggersRerun? Not super happy with that either. I'll try to think on that more.

aomarks commented 2 years ago

I do like this approach. The syntax feels more complex but conceptually what it's trying to do makes more sense. While I like the sugar of the output slice syntax, it definitely feels like superuser territory. I'm assuming then specifying an output slice with a ! in the dependency implies outputInvalidates: false?

Yeah. Or you could think of it as a way to automatically produce the exact configuration shown above it under "Output slices".

rictic commented 2 years ago

Do outputs flow down transitively? i.e. if A depends on B depends on C does A's fingerprint include the output files from just B or B and C?

I believe that they should, just based on common usage patterns I see

google / wireit