[rush] Rush API for interacting with the Build Cache

elliot-nelson commented 2 months ago

Summary

I would like Rush to expose an API for interacting directly with the Rush Build Cache, as if it was a <string, string> cache map.

Details

Use Case

Imagine a situation where you have many settings in a file to validate; this validation happens to be very expensive (30+ seconds), unless you've pre-computed a special bit of data which lets you do the validation in only a few milliseconds.

There's a way you could abuse the Rush build cache to solve this problem today: imagine a new Rush project, empty, except for a package.json with a build script that performed the very expensive operation, taking the file ./input and producing the file ./temp/output. Now imagine a bash script which loops through a hundred configs; on each loop, overwrite the local contents of ./input and run rush build --to ., then read the contents of ./temp/output.

This is exactly the job I am considering building to a solve a problem we have internally, but I'd prefer something more like this:

import { RushConfiguration, RushCacheLayer } from 'rush-sdk';

const rushConfig = RushConfiguration(...,...);
const cache = rushConfig.getCacheLayer();

for (const config of configs) {
  let key = cache.getString(`my-process:${config.name}`);
  if (!key) {
    key = veryExpensiveOperation(config);
    cache.setString(`my-process:${config.name}`, key);
  }
  performSImpleValidation(key, config);
}

Proposed requirements

I don't have any specific API in mind, but some requirements I think are:

Zero-config initialization... I want to say "give me a cache instance", and the thing returned should use the same authentication, cache endpoint, cache read/write permissions, as if I had run rush build in this context.
Low-level string-to-string API. Basic gets and sets, leave anything fancier (such as serializing/deserializing a JSON object) up to the caller.
Scope the entries (in the cache). I think it's safest if no real build cache entries are accessible via this method. This would prevent certain very fancy use cases, but protects the caller from accidentally poking or overwriting cache results for real projects in the monorepo, which seems like the safest bet here. (But perhaps it's worth exploring this more.)

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question	Answer
`@microsoft/rush` globally installed version?
`rushVersion` from rush.json?
`useWorkspaces` from rush.json?
Operating system?
Would you consider contributing a PR?
Node.js version (`node -v`)?

chengcyber commented 2 months ago

Here is another user scenario that build cache can help to restore things when the building process doesn't hit cache.

There are lots of tools supporting "incremental" cache, such as webpack, eslint, tsc and etc... When the building process doesn't hit Rush.js build cache. Rush.js build cache could also benefit the build process by restoring back the "incremental build cache" file.

For example, running "eslint" for an application project takes 10 secs. Rush.js build cache works pretty well in the case if no code modifications. But even if a small changes on project's package.json file. The entire build cache got invalidated and the build process takes another 10seconds. If we get ".eslint-cache" file back, then "eslint" can probably speed up by reusing existing information from the previous cache.

dmichon-msft commented 2 months ago

I like the concept, though for maximum compatibility I would prefer that the underlying data type be Buffer (or Buffer[], which works with writev). The APIs for the providers currently operate on Buffer. The get API would have to be async, though set could be synchronous and just put the work on a background queue with a flushAsync() that gets called as part of cleaning up the cache.

Regarding @chengcyber 's suggestion, this would necessitate some way for Rush to walk back the version history (presumably by grabbing git rev-parse HEAD and/or git merge-base HEAD main).

microsoft / rushstack