ojkelly / yarn.build

Build 🛠 and Bundle 📦 your local workspaces. Like Bazel, Buck, Pants and Please but for Yarn Berry. Build any language, mix javascript, typescript, golang and more in one polyglot repo. Ship your bundles to AWS Lambda, Docker, or any nodejs runtime.
https://yarn.BUILD
MIT License
325 stars 28 forks source link

fix: improve rebuild detection #232

Closed zinserjan closed 2 years ago

zinserjan commented 2 years ago

This PR tries to resolve the issues in #231 and improves the algorithm when a project needs to be rebuild.

It's still a ~WIP~ and tries to solve the following problems:

The reason to rebuild a project within the monorepo depends on the following conditions:

  1. changed input files of the project (src files)
  2. expected output files doesn't exist, yet
  3. one of it's dependencies (recursive) needs a rebuild

1. input files of the project changed

The first problem I had was the correct detection of the input files. The main problem for me was the issue, that the default configuration is still in use, even when you set it explicitly via yarn.build in package.json.

That's why I refactored the whole input detection:

  1. no input configuration in package.json -> use default configuration / try to determine it via different mechanisms 1.1 use default path ./ 1.2 check if default tsconfig.json or custom tsconfig exists and add include to the already existing input path config and use exclude to exclude these files from input cache detection
  2. explicit input configuration via package.json 2.1 use the provided pattern, files and folders of the package.json input configuration 2.2. if a custom tsconfig is expliclity configured via yarn.build/tsconfig in package.json, enhance the input config with files from include and use exclude to exclude (like 1.2)

2. expected output files doesn't exist, yet

Since #226 every output file, folder or pattern needs to exist, otherwise a rebuild is necessary. I don't remember what didn't work there, but it was quite a lot and lead to many issues for me.

That's why I had to improve this also:

  1. no output configuration in package.json -> use default configuration / try to determine it via different mechanisms 1.1 theres no longer any default path, because this doesn't makes any sense 1.2 check if one of the following fields exist in package.json and add it as output path: bin, directories.bin, files, main 1.3 check if default tsconfig.json or custom tsconfig exists and add outDir to the already existing output path config 1.4 fallback: if we didn't find any paths yet, use build as default
  2. explicit output configuration via package.json 2.1 use the provided pattern, files and folders of the package.json output configuration 2.2 if a custom tsconfig is expliclity configured via yarn.build/tsconfig in package.json enhance the output config with outDir

3. one of it's dependencies (recursive) needs a rebuild

We have to make sure that the right projects are marked for rebuild, when we need to build something. That's why I modified the build state of yarn.build.json:

  1. Property haveCheckedForRerun is no longer necessary. To be honest I didn't understand the reason for this variable, because the we have to check every project anyway and the local state is already cached in checkIfRunIsRequiredCache.
  2. Property rerun is set to true, whenever a build is open/necessary for this project. 2.1 successful builds sets rerun to false, failed builds set it to true again 2.2 when we build only a single project, we need to set rerurn to true for all dependent projects (recursively). Otherwise we can't catch the necessary build in the future, which was necessary due to one of the dependencies.
  3. Whenever we build a project we have to make sure that we determined the timestamps, otherwise we have to build it again the next time

Summary

This seems to work pretty good for my project and for this workspace, too. I've tested this on MacOS and Windows. But of course this needs some intensive testing :)

zinserjan commented 2 years ago

@ojkelly I think I'm done. At least for now.

Regarding the reading of tsconfig. Initially, I tried to use the typescript package for reading tsconfig, but I wasn't able to mark typescript as an optional dependency. Either the build failed or the compiled plugin wasn't able to resolve typescript (maybe due to pnp). Including typescript into the bundle increased the plugin from around 350KB to ~1MB. That's why I used get-tsconfig as an alternative way to read the tsconfig. The only downside is that the resolved output format differs from typescript. Typescript parses and resolves the tsconfig completely and transforms include and exclude into a single filenames property which contains all files to compile. With get-tsconfig we have to resolve these files ourselves, but that isn't really a problem due to the added support of globs.

@jeffrson can you give this a try? Just replace the existing version of yarn.build with one of the files of this PR in .yarn/plugins/@ojkelly.

jeffrson commented 2 years ago

@zinserjan So far this looks very promising - great work! Two points after a couple of tests:

zinserjan commented 2 years ago

previously I had 8 slots for building (corresponding to 8 CPU cores), now there are 16. I'm not sure, why - taking hyperthreading into account? Does not seem faster - I think it's better to keep the number for cores

Nothing changed here in this PR. That was introduced in #224, but isn't released, yet.

for some reason our sources for typescript are located inside src, but tsconfig.json contains "outDir": "." (I'm not happy with it, but we currently need it like this), that's why I had ""output": "." in yarn.build settings inside package.json. Unfortunately this seems to defeat detection of changes: project is not rebuilt anymore at all. AFAICT, it worked previously (which might be due to not so sophisticated change detection though). Do you think, this could work as well? It would be hard to maintain a list of subfolders of src in package.json

I guess it rebuilts before due to the changed timestamps (the latest generated file triggered the rebuilt). The problem here is that I reduce the input files with the output files, but this is of course a problem when input and output are the same. I don't think that we can handle this case automatically. Ignoring output when it's the same as input means that we cannot cache, using both doesn't rebuild. I think the best solution is a proper configuration of your input and output. Add the following to your package json:

  "yarn.build": {
    "input": ["src", "!**/*.js", "!**/*.d.ts"]
    "output": ["src/**/*.js", "src/**/*.d.ts"]
  },

Note: add the d.ts patterns only if they exists, otherwise you have no caching, cause every output pattern must exist.

zinserjan commented 2 years ago

It looks good to me. My own tests are still working. I have created an example for the 'upload only' case.

Setting null as output skips the output checks and only rebuilds when files changed. Explicit input and output file configuration will no longer be touched from anything else.