nrwl / nx

Smart Monorepos ยท Fast CI
https://nx.dev
MIT License
23.23k stars 2.31k forks source link

dependentTasksOutputFiles doesn't work when using Distributed Task Execution #22745

Open jonapgar-groupby opened 5 months ago

jonapgar-groupby commented 5 months ago

Current Behavior

dependentTasksOutputFiles doesn't work when using Distributed Task Execution.

I suspect this has something to do with the timing of fetching outputs from dependent tasks from the remote cache.

Expected Behavior

dependentTasksOutputFiles works when using DTE

GitHub Repo

jonapgar-groupby/nx-dte-input-caching-bug

Steps to Reproduce

  1. Create a fork or pull request for the above repository.
  2. Observe the resulting CI actions https://github.com/jonapgar-groupby/nx-dte-input-caching-bug/actions/workflows/ci.yml

For the first run the output should will include something like this (note the cat output.txt sections for the first and second targets respectively)

> nx run org:first 

> echo "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}" > output.txt

> cat output.txt

8804502729-1

 NX   Processing task org:second

 NX   Nx Cloud: Downloaded 16592876817398947242

 NX   Extracting artifacts

from: /home/runner/work/nx-dte-input-caching-bug/nx-dte-input-caching-bug/.nx/cache/16592876817398947242/outputs
to: /home/runner/work/nx-dte-input-caching-bug/nx-dte-input-caching-bug

 NX   Finished extracting artifacts

from: /home/runner/work/nx-dte-input-caching-bug/nx-dte-input-caching-bug/.nx/cache/[165](https://github.com/jonapgar-groupby/nx-dte-input-caching-bug/actions/runs/8804502729/job/24165022966#step:8:166)92876817398947242/outputs
to: /home/runner/work/nx-dte-input-caching-bug/nx-dte-input-caching-bug

> nx run org:second 

> cat output.txt

8804502729-1

Now, re-run the same action using the Github UI. When you do, observe that the logs shows that, for the first target the printed output.txt file ends with -2 (for attempt number 2). For the second target, it still shows the -1.

> nx run org:first 

> echo "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}" > output.txt

> cat output.txt

8804502729-2

 NX   Processing task org:second

 NX   Nx Cloud: Downloaded 16592876817398947242

 NX   Extracting artifacts

from: /home/runner/work/nx-dte-input-caching-bug/nx-dte-input-caching-bug/.nx/cache/16592876817398947242/outputs
to: /home/runner/work/nx-dte-input-caching-bug/nx-dte-input-caching-bug

 NX   Finished extracting artifacts

from: /home/runner/work/nx-dte-input-caching-bug/nx-dte-input-caching-bug/.nx/cache/[165](https://github.com/jonapgar-groupby/nx-dte-input-caching-bug/actions/runs/8804502729/job/24165217969#step:8:166)92876817398947242/outputs
to: /home/runner/work/nx-dte-input-caching-bug/nx-dte-input-caching-bug

> nx run org:second 

> cat output.txt

8804502729-1

You can view similar errant runs with DTE here:

https://github.com/jonapgar-groupby/nx-dte-input-caching-bug/actions/runs/8804727389/attempts/1 https://github.com/jonapgar-groupby/nx-dte-input-caching-bug/actions/runs/8804727389/attempts/2

Here is with DTE disabled, where it works correctly:

https://github.com/jonapgar-groupby/nx-dte-input-caching-bug/actions/runs/8804727396/attempts/1 https://github.com/jonapgar-groupby/nx-dte-input-caching-bug/actions/runs/8804727396/attempts/2

Nx Report

Node   : 20.11.1
OS     : linux-arm64
npm    : 10.2.4

nx                 : 18.2.2

Failure Logs

No response

Package Manager Version

No response

Operating System

Additional Information

I could probably help fix this if the nx-cloud npm package was not obfuscated ๐Ÿ˜’

StalkAltan commented 5 months ago

Hi @jonapgar-groupby, can you provide a reproduction or describe the way that this isn't working as expected? Otherwise it is difficult to know what problem we should be expecting.

Thank you in advance!

zackarydev commented 5 months ago

Could this ticket be related to #22830 ?

jonapgar-groupby commented 4 months ago

Your ticket could be related I suppose. It's worth noting that I only experience this issue when using distributed task execution with Nx cloud.

In my case, among other things, I want to be able to conditionally re-run a docker target whenever the actual dist/ files have changed (but not necessarily when other dependencies changed in a way that didn't actually update the files in dist/)

The "workaround" is just to include production and ^production as inputs for the docker executor, instead of just e.g. "dependentTasksOutputFiles": "**/dist/**/*". But this is not ideal as it means any file changes in the project will trigger a docker build, even when it's not actually necessary.

jonapgar-groupby commented 4 months ago

It seems very likely that the issue has to with fetching outputs from the remote cache when distributing tasks across agents.

Without being able to access the source code ๐Ÿ˜ž, I am assuming that the distributed task runner doesn't actually fetch the dependent outputs before computing the hashes from the dependentTasksOutputFiles matches.

When you're not distributing tasks, ^this wouldn't really matter as the dependent targets are run on the same machine anyways.

I will try to make a reproduction...

I'm guessing here but perhaps this is a misguided optimization to avoid fetching dependent outputs for a target when it's determined that it's inputs haven't changed (overlooking the dependentTasksOutputFiles feature).

jonapgar-groupby commented 4 months ago

@StalkAltan I've updated the issue with a reproduction repository and steps to reproduce.

Thanks for helping out with this!

jonapgar-groupby commented 4 months ago

Any updates on this? This prevents me from using distributed task execution effectively. Without a fix, Docker images need to be rebuilt and published even when the actual inputs haven't changed.

jonapgar-groupby commented 3 months ago

I feel like I wasted time with a reproduction repository only to get ghosted on this issue.

getlarge commented 1 month ago

Hi @jonapgar-groupby, at least you got some attention from the outside world :) I just tried your repo and noticed that the tasks are not cacheable, even though you listed them under tasksRunnerOptions[runnerName].options.cacheableOperations.

Try to define the cache property under your targets first and second and you will see a different behavior.

jonapgar-groupby commented 1 month ago

Unfortunately, @getlarge, this doesn't seem to solve the issue.

It seems when using nx-cloud you can no longer see the individual target's output in the github action logs, but here is a screenshot from the nx app for the second run of the action.

image

The second target should have been a cache-miss as well, if it properly checked the dependentTasksOutputFiles and noticed the file had changed.

In the code below it appears as if setting the cache per target or via cacheableOperations will have the same effect for now (though I can't confirm how these settings might interact with nx-cloud.)

https://github.com/nrwl/nx/commit/2a18fd1f3f9e9f3cec2cf022faeffbd60fa8216b#diff-37371aa9744966490c7e5074cd6a43071eccf0e9c75143e5f6c08b87b4405ea7R355-R365

FWIW I have been using cacheableOperations for successful caching using DTE elsewhere. It's only when using dependentTasksOutputFiles with dte that causes this unpredicted behaviour

Thanks for the tip though, I didn't realize Nx now allow lets you set caching per target, which will be useful for me elsewhere :)