vercel / turborepo

Build system optimized for JavaScript and TypeScript, written in Rust
https://turbo.build/repo/docs
MIT License
26.28k stars 1.81k forks source link

Tasks should run `dependsOn` before hashing inputs #8051

Open JavaScriptBach opened 6 months ago

JavaScriptBach commented 6 months ago

Verify canary release

Link to code that reproduces this issue

https://github.com/JavaScriptBach/turbo-caching-bug

What package manager are you using / does the bug impact?

Yarn v2/v3/v4 (node_modules linker only)

What operating system are you using?

Linux

Which canary version will you have in your reproduction?

1.13.3-canary.4

Describe the Bug

If I have a task A whose input is a generated file produced by another task B, I have to Turbo task A twice before it gets cached.

What I think is happening

I think it's because on the first invocation:

  1. Turbo hashes the input, but the generated file doesn't exist yet.
  2. Turbo runs the dependent task, which produces the generated file
  3. Turbo runs the original task and stores it under the now-stale hash.

On second invocation:

  1. Turbo sees that the generated file now exists, so the hash is different from before, therefore it runs the original task again.

Why I think it's a bug

In order to obtain correct caching behavior, I currently have to exclude all generated files from the input. This is non-intuitive because the generated files are conceptually inputs to my task. Furthermore, I've already told Turbo that it depends on my codegen task.

It would be nice for Turbo to handle this, perhaps by running all dependsOn tasks before hashing the inputs to the original task?

Expected Behavior

Everything is cached after running Turbo once.

To Reproduce

See the linked repo.

Additional context

No response

peplin commented 6 months ago

Here's an example of 3 sequential turbo runs from the linked repository. You can see how it takes 3 runs to get to FULL TURBO:

$ turbo my-test --summarize                                                                                                                     
• Running my-test
• Remote caching disabled
my-codegen: cache miss, executing fb830814bce3d882
my-codegen:
my-test: cache miss, executing 2ef6da69cc2b018d
my-test:

  Tasks:    2 successful, 2 total
 Cached:    0 cached, 2 total
   Time:    373ms
Summary:    /Users/peplin/dev/turbo-caching-bug/.turbo/runs/2fejXqPcSUhLBGZsQgZ3Flk77ml.json

$ node_modules/.bin/turbo my-test --summarize
• Running my-test
• Remote caching disabled
my-codegen: cache hit, replaying logs fb830814bce3d882
my-codegen:
my-test: cache miss, executing fcb34ccdb1cf4377
my-test:

  Tasks:    2 successful, 2 total
 Cached:    1 cached, 2 total
   Time:    242ms
Summary:    /Users/peplin/dev/turbo-caching-bug/.turbo/runs/2fejY73vyb4eLqszBlFWviHymzE.json

$ node_modules/.bin/turbo my-test --summarize
• Running my-test
• Remote caching disabled
my-codegen: cache hit (outputs already on disk), replaying logs fb830814bce3d882
my-codegen:
my-test: cache hit, replaying logs fcb34ccdb1cf4377
my-test:

  Tasks:    2 successful, 2 total
 Cached:    2 cached, 2 total
   Time:    73ms >>> FULL TURBO
Summary:    /Users/peplin/dev/turbo-caching-bug/.turbo/runs/2fejYAIPKlkQD5auUDsWA7FLt2M.json

Here are the 3 summary files:

1.json 2.json 3.json

1.json does not include my-codegen.txt in its inputs, so the cache hash is different than run 2. I would expect the first run to be a cache miss, but with a generated hash that matches the second run.

weyert commented 6 months ago

Yeah, I think I might have a similar problem but for the codegen-task is in the same package, e.g. using:

{
  "$schema": "https://turbo.build/schema.json",
  "extends": ["//"],
  "pipeline": {
    "generate": {
      "outputMode": "new-only",
      "inputs": [
        "src/**/*.yml"
      ],
      "outputs": ["src/**/*", "!src/**/*.yml"],
      "cache": true
    },
    "build": {
      "outputMode": "new-only",
      "inputs": [
        "!src/**/*.yml"
      ],
      "outputs": ["lib/**"],
      "dependsOn": ["generate"],
      "cache": true
    }
  }
}
mattico commented 2 months ago

I also ran into this issue and made a repro of my own before I found yours: https://github.com/mattico/turborepo-repro

I can verify this issue still exists with version 2.0.15-canary.4.

Leksat commented 1 month ago

Just met this issue. Confirming that 2.1.2-canary.0 is affected.