nrwl / nx

Smart Monorepos · Fast CI
https://nx.dev
MIT License
23.85k stars 2.38k forks source link

"Invalid Cache Directory" error when running tasks locally despite clean local cache #18642

Closed NickCiliak closed 1 year ago

NickCiliak commented 1 year ago

Current Behavior

After upgrading to Nx 16.6.0, we are seeing the new "invalid cache directory" error when running tasks locally. After running nx reset as the error suggests and ensuring that the local Nx cache in node_modules has been removed, we still get the error.

If we use --skip-nx-cache when running tasks then we don't get the error, which makes sense, but does that mean it's throwing an error for the artifact retrieved from the remote Nx cloud cache? Again, local cache is empty.

Am I misunderstanding how this should work or could it be a bug?

Expected Behavior

If the local Nx cache is removed, running a task locally should not result in the Invalid Cache Directory error.

GitHub Repo

No response

Steps to Reproduce

  1. Make a change to an app locally that would result in a cache miss for the build task, but do not run the task locally (or make sure connection to Nx Cloud is disabled temporarily)
  2. Push change to git remote and CI runs build task (Nx Cloud cache miss)
  3. Build task artifact is now stored in remote cache. Locally, run nx reset to clear local cache, then run build task.
  4. Get "Invalid Cache Directory for Task" error.

Nx Report

>  NX   Report complete - copy this into the issue template

   Node   : 16.14.2
   OS     : darwin-x64
   yarn   : 1.22.19

   nx (global)        : 16.4.0
   nx                 : 16.6.0
   @nx/js             : 16.6.0
   @nx/jest           : 16.6.0
   @nx/linter         : 16.6.0
   @nx/workspace      : 16.6.0
   @nx/cypress        : 16.6.0
   @nx/devkit         : 16.6.0
   @nx/eslint-plugin  : 16.6.0
   @nx/react          : 16.6.0
   @nrwl/tao          : 16.6.0
   @nx/web            : 16.6.0
   @nx/webpack        : 16.6.0
   nx-cloud           : 16.0.5
   typescript         : 5.1.6
   ---------------------------------------
   Community plugins:
   @jscutlery/semver : 2.30.1
   @nxkit/playwright : 2.2.0
   ---------------------------------------

Failure Logs

nx run-many -t build
 >  NX   Ran target build for 10 projects (183ms)

    ✔    0/0 succeeded [0 read from cache]

    ✖    0/0 targets failed, including the following:

   View structured, searchable error logs at https://nx.app/runs/<redacted>

 >  NX   Invalid Cache Directory for Task "api:build"

   The local cache artifact in "/<redacted>/node_modules/.cache/nx/<redacted>" was not been generated on this machine.
   As a result, the cache's content integrity cannot be confirmed, which may make cache restoration potentially unsafe.
   If your machine ID has changed since the artifact was cached, run "nx reset" to fix this issue.
   Read about the error and how to address it here: https://nx.dev/recipes/troubleshooting/unknown-local-cache

   Pass --verbose to see the stacktrace.

Operating System

Additional Information

No response

FrozenPandaz commented 1 year ago

Can you please run the following and compare the 2 values?

cat /<redacted>/node_modules/.cache/nx/<redacted>/source

node -e "require('node-machine-id').machineId().then(console.log)"

You should get the same value for both.

Is the cache located within your repo? It looks like it. Can you make sure that directory is removed by running nx reset?

NickCiliak commented 1 year ago

Thanks, I ran those commands and I have different values as a result!

Yep the cache is at /node_modules/.cache/nx/ in my repo. Even when removing the local cache with nx reset, we get the error.

B1Z1 commented 1 year ago

Same, we have our own Nx Remote Cache implementation, and after upgrading to 16.16 we getting an error. Our own remote cache implementation is the same as described in the documentation:

https://nx.dev/recipes/troubleshooting/unknown-local-cache#unknown-local-cache-error

Probably something wrong with the newest version

B1Z1 commented 1 year ago

Just updated it to 16.7 and probably it works

NickCiliak commented 1 year ago

I just upgraded to 16.7.0 as well. The issue persists but is only triggered for cache artifacts that were (seemingly) generated before that upgrade.

I haven't been able to recreate the issue with new artifacts since upgrading, so maybe I need to make noop changes to all of my projects to "reset" the Nx Cloud cache.

Tom910 commented 1 year ago

Similar issue https://github.com/nrwl/nx/issues/18449 . I have sometimes the same error on nx 16.5 version

NickCiliak commented 1 year ago

Related to my previous comment, I was mistaken and my upgrade to 16.7.0 was not actually successful. After successfully upgrading to 16.7.0, we haven't seen the error occur, so this may be fixed by upgrading to latest. We're going to keep an eye on it for the rest of the week to see if we run into it again.

ddivecs commented 1 year ago

I'm running into this issue using while using https://github.com/bojanbass/nx-aws. It seems to have been introduced by https://github.com/nrwl/nx/commit/1bc58c997d48c345778a8c63776de5a56e214416.

The nx-aws plugin has a pullrequest up to try to work around this. https://github.com/bojanbass/nx-aws/issues/368.

The problem seems to be that nx always includes the source file even if the task runner implements the RemoteCache interface, meaning that any RemoteCache task runner is responsible for deleting this file before it can do its own integrity check.

josokinas commented 1 year ago

Can confirm issue still persists in 16.7.0 release.

alexciesielski commented 1 year ago

16.8.0 and experiencing the issue in Github Actions CI

TriPSs commented 1 year ago

Experiencing the same here since 16.8.0. In the same boat as @ddivecs expect I'm using @nx-extend/gcp-task-runner (GCP version of nx-aws)

ddivecs commented 1 year ago

@FrozenPandaz are you able to confirm if the intention is for every remote-cache runner to manually handle this source file, or if the source file is there incorrectly for remote-cache runners?

If its meant to be handled by each runner, the documentation for remote cache runners could use some better phrasing. https://nx.dev/recipes/troubleshooting/unknown-local-cache#implementing-remote-cache-interface

this document indicates what each runner needs to do to ensure the cache is safe, which is useful, but does NOT indicate what files are there that may need to be removed.

FrozenPandaz commented 1 year ago

No, other remote-cache runners do not have to handle the source file.

Only local cache hits should throw that error.

I have a fix for this issue.

alexciesielski commented 1 year ago

Confirmed fixed in the CI, but unfortunately now happening locally

nx --version
Nx Version:
- Local: v16.8.1
- Global: v16.8.1

Already tried nx reset and rm -rf node_modules/.cache and it fixes it for one time, but the second time I execute the nx command I get the same error again.

jhecking commented 1 year ago

Also still seeing this issue, despite having updated nx to v16.8.1. I can run a build once and it pulls the results from the remote cache without problems. But if I then run the same build again, it fails with the "Invalid Cache Directory" error again:

❯ nx build my-lib

 >  NX   USING REMOTE CACHE

> nx run my-lib:build  [remote cache]

Compiling TypeScript files for project "my-lib"...
Done compiling TypeScript files for project "my-lib".

 >  NX   Successfully ran target build for project my-lib (346ms)

   Nx read the output from the cache instead of running the command for 1 out of 1 tasks.

❯ nx build my-lib

 >  NX   USING REMOTE CACHE

 >  NX   Ran target build for project my-lib (31ms)

    ✖    0/0 failed
    ✔    0/0 succeeded [0 read from cache]

 >  NX   Invalid Cache Directory for Task "my-lib:build"

   The local cache artifact in "/path/to/my/project/node_modules/.cache/nx/5611598247607800777" was not been generated on this machine.
   As a result, the cache's content integrity cannot be confirmed, which may make cache restoration potentially unsafe.
   If your machine ID has changed since the artifact was cached, run "nx reset" to fix this issue.
   Read about the error and how to address it here: https://nx.dev/recipes/troubleshooting/unknown-local-cache

   Pass --verbose to see the stacktrace.

❯ nx reset

 >  NX   Resetting the Nx workspace cache and stopping the Nx Daemon.

   This might take a few minutes.

 >  NX   Daemon Server - Stopped

 >  NX   Successfully reset the Nx workspace.

❯ nx build my-lib

 >  NX   USING REMOTE CACHE

> nx run my-lib:build  [remote cache]

Compiling TypeScript files for project "my-lib"...
Done compiling TypeScript files for project "my-lib".

 >  NX   Successfully ran target build for project my-lib (7s)

   Nx read the output from the cache instead of running the command for 1 out of 1 tasks.

❯ nx --version
Nx Version:
- Local: v16.8.1
- Global: v16.8.1

It seems like the contents are fetched from the remote cache in the first build, and then a copy is placed in the local cache. Then, on the second run, nx tries to validate the contents of the local cache and that fails.

github-actions[bot] commented 1 year ago

This issue has been closed for more than 30 days. If this issue is still occuring, please open a new issue with more recent context.