vercel / turborepo

Build system optimized for JavaScript and TypeScript, written in Rust
https://turbo.build/repo/docs
MIT License
26.22k stars 1.81k forks source link

Turbo caches incomplete results. #6754

Closed laat closed 10 months ago

laat commented 10 months ago

Verify canary release

Link to code that reproduces this issue

https://github.com/laat/turbo-repo-cache-repro

What package manager are you using / does the bug impact?

pnpm

What operating system are you using?

Mac

Which canary version will you have in your reproduction?

1.11.1

Describe the Bug

when running concurrent linting and build, if the linting fails the build is killed and its outputs are cached

Expected Behavior

it should not store partial/incomplete outputs

To Reproduce

I created a repro

clone it and:

  1. pnpm install
  2. pnpm turbo build lint (expected to fail on lint, before build completes)
  3. pnpm turbo build

Expected result after final step:

apps/docs/dist is created with content.

Actual result:

apps/docs/dist does not exist, but has this output with successful exit code:

docs:build: cache hit, replaying logs d7849613a3a4e98d
docs:build:
docs:build: > docs@1.0.0 build /Users/laat/git/turborepo-repro/apps/docs
docs:build: > node build.js
docs:build:

 Tasks:    1 successful, 1 total
Cached:    1 cached, 1 total
  Time:    58ms >>> FULL TURBO

Additional context

works as expected in 1.10.16

chris-olszewski commented 10 months ago

I've been testing your repro and have been unable to get the result you describe. A few pieces of information that might help:

laat commented 10 months ago
$ pnpm turbo daemon clean
Done

$ git clean -xfd
Removing node_modules/

$ pnpm install
Scope: all 3 workspace projects
Lockfile is up to date, resolution step is skipped
Packages: +2
++
Progress: resolved 2, reused 2, downloaded 0, added 2, done

devDependencies:
+ turbo 1.11.1

The dependency was already listed in devDependencies.
If you want to make it a prod dependency, then move it manually.

Done in 309ms
$ pnpm turbo build lint
• Packages in scope: docs, web
• Running build, lint in 2 packages
• Remote caching disabled
docs:build: cache miss, executing 2f2ac4c8cf9904ef
web:lint: cache miss, executing 07a271d5d104cda2
docs:build: 
docs:build: > docs@1.0.0 build /Users/laat/git/turborepo-repro/apps/docs
docs:build: > node build.js
docs:build: 
web:lint: 
web:lint: > web@1.0.0 lint /Users/laat/git/turborepo-repro/apps/web
web:lint: > node lint.js
web:lint: 
web:lint: Linting failed
web:lint:  ELIFECYCLE  Command failed with exit code 1.
web:lint:  WARN   Local package.json exists, but node_modules missing, did you mean to install?
web:lint: ERROR: command finished with error: command (/Users/laat/git/turborepo-repro/apps/web) /Users/laat/Library/pnpm/pnpm run lint exited (1)
web#lint: command (/Users/laat/git/turborepo-repro/apps/web) /Users/laat/Library/pnpm/pnpm run lint exited (1)

 Tasks:    1 successful, 2 total
Cached:    0 cached, 2 total
  Time:    899ms 
Failed:    web#lint

 ERROR  run failed: command  exited (1)
$ pnpm turbo build
• Packages in scope: docs, web
• Running build in 2 packages
• Remote caching disabled
docs:build: cache hit, replaying logs 2f2ac4c8cf9904ef
docs:build: 
docs:build: > docs@1.0.0 build /Users/laat/git/turborepo-repro/apps/docs
docs:build: > node build.js
docs:build: 

 Tasks:    1 successful, 1 total
Cached:    1 cached, 1 total
  Time:    55ms >>> FULL TURBO

Cache files: https://github.com/laat/turbo-repo-cache-repro/tree/cache/node_modules/.cache/turbo

laat commented 10 months ago

and a CI job with the same behavior: https://github.com/laat/turbo-repo-cache-repro/pull/1

chris-olszewski commented 10 months ago

Can you quick check if using --go-fallback with your turbo commands results in the correct behavior?

laat commented 10 months ago

https://github.com/laat/turbo-repo-cache-repro/pull/2

--go-fallback works

laat commented 10 months ago

Could it be the way pnpm handles SIGINT? https://github.com/pnpm/pnpm/issues/7374

pnpm seems to exit with exit-code 0 when it's receives SIGINT

chris-olszewski commented 10 months ago

It could be?

It's definiteness something to do with signal handling since it appears that when we're shutting down the docs#build task by sending it a SIGINT for some reason it's taking the path of a process that exited with 0. Either way, this is a bug on our part for not handling it.

chris-olszewski commented 10 months ago

Okay, I've been able to reproduce the behavior you're seeing by modifying turbo with the following change:

visitor.rs

@@ -814,7 +814,7 @@ impl ExecContext {
         };

         match exit_status {
-            ChildExit::Finished(Some(0)) => {
+            ChildExit::Finished(Some(0)) | ChildExit::Finished(None) => {

So I'm guessing that somehow when we're sending the SIGINT to the process running pnpm run build for docs we're getting back an exit code of 0. Will compare our Go/Rust implementations of sending these signals to see if there are any minor differences that could cause this.

I'm still unsure what could be different from my machines and your machine/Github Actions where I can't get the same behavior out of the box.

laat commented 10 months ago

My guess is that we are entering different listeners in pnpm somehow. It's rather strange.

There are two listeners on subprocess termination in pnpm:

https://github.com/pnpm/npm-lifecycle/blob/e3766d18f13b75f31ab162dbef8e7167cda3afe7/index.js#L279-L290

and

https://github.com/pnpm/npm-lifecycle/blob/e3766d18f13b75f31ab162dbef8e7167cda3afe7/index.js#L331-L333

where both the github action and me seem to enter the exit listener, while you are entering the close listener. I do not have access the machine where this issue occurs until monday to debug further.

laat commented 10 months ago

what version of pnpm do you use?

I'm seeing the error with pnpm >= 8.10, but not in pnpm@8.9.2

laat commented 10 months ago

It's this change https://github.com/pnpm/npm-lifecycle/pull/41 that causes pnpm to return exit code 0 on SIGINT. released in pnpm@8.10

chris-olszewski commented 10 months ago

Ah yup, you're right. I have corepack enabled so I was using pnpm@8.9.0 from the package.json.