nrwl / nx

Smart Monorepos · Fast CI
https://nx.dev
MIT License
23.52k stars 2.35k forks source link

Running multible instances of nx throw database locked error #28608

Open ugrave opened 1 day ago

ugrave commented 1 day ago

Current Behavior

If i run mutlible instances of nx in parallel on the same workspace some of them failing with the following error:

 NX   DB execute error: "INSERT OR REPLACE INTO task_details  (hash, project, target, configuration)

                     VALUES (?1, ?2, ?3, ?4)", SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
 Error: DB execute error: "INSERT OR REPLACE INTO task_details  (hash, project, target, configuration)
                     VALUES (?1, ?2, ?3, ?4)", SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
     at hashTasksThatDoNotDependOnOutputsOfOtherTasks (/build/node_modules/nx/src/hasher/hash-task.js:46:22)
     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
     at async invokeTasksRunner (/build/node_modules/nx/src/tasks-runner/run-command.js:373:5)
     at async runCommandForTasks (/build/node_modules/nx/src/tasks-runner/run-command.js:117:25)
     at async /build/node_modules/nx/src/tasks-runner/run-command.js:105:29
     at async handleErrors (/build/node_modules/nx/src/utils/handle-errors.js:9:24)
     at async runCommand (/build/node_modules/nx/src/tasks-runner/run-command.js:104:20)
     at async Object.runMany (/build/node_modules/nx/src/command-line/run-many/run-many.js:43:24)
     at async /build/node_modules/nx/src/command-line/run-many/command-object.js:13:13
     at async handleErrors (/build/node_modules/nx/src/utils/handle-errors.js:9:24)

This happens since the update to nx 20.x Im also tried out to disable the new caching with useLegacyCache=true in nx.json. But getting the same error.

We using this to run cypress tests in parallel with cypress-split on the same CI node.

Expected Behavior

It should be possible to run multiple instances nx on the same workspace.

GitHub Repo

No response

Steps to Reproduce

  1. create a new workspace with mutlible projects
  2. run to 2 nx instances: nx run-many -t lint nx run-many -t test

Nx Report

NX Report complete - copy this into the issue template

Node : 18.20.3 OS : linux-x64 Native Target : x86_64-linux npm : 10.7.0

nx (global) : 20.0.5 nx : 20.0.5 @nx/js : 20.0.5 @nx/jest : 20.0.5 @nx/eslint : 20.0.5 @nx/workspace : 20.0.5 @nx/angular : 20.0.5 @nx/cypress : 20.0.5 @nx/devkit : 20.0.5 @nrwl/devkit : 16.5.1 @nx/eslint-plugin : 20.0.5 @nx/plugin : 20.0.5 @nrwl/tao : 16.5.1 @nx/web : 20.0.5 @nx/webpack : 20.0.5 typescript : 5.1.6

Community plugins: @jsverse/transloco : 7.5.0 @ngrx/effects : 16.3.0 @ngrx/eslint-plugin : 16.3.0 @ngrx/router-store : 16.3.0 @ngrx/schematics : 16.3.0 @ngrx/store : 16.3.0 @ngrx/store-devtools : 16.3.0 nx-stylelint : 17.1.6

Local workspace plugins: @app/workspace-plugin

The following packages should match the installed version of nx

To fix this, run nx migrate nx@20.0.5

Failure Logs

No response

Package Manager Version

No response

Operating System

Additional Information

No response

jsdevtom commented 1 day ago

We can also reproduce.

k3nsei commented 1 day ago

Happens to me also in docker ubuntu while running tests. Ok it happens randomly running different commands makes our CI/CD ustable.

#20 0.577 > npx nx run-many --targets=test --all --configuration=ci --skip-nx-cache --verbose
#20 0.577
#20 9.653
#20 9.654  NX   Unable to set journal_mode: SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
#20 9.654
#20 9.654 Error: Unable to set journal_mode: SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
#20 9.654     at /workspace/node_modules/nx/src/utils/db-connection.js:11:93
#20 9.654     at getEntryOrSet (/workspace/node_modules/nx/src/utils/db-connection.js:19:17)
#20 9.654     at getDbConnection (/workspace/node_modules/nx/src/utils/db-connection.js:11:24)
#20 9.654     at getTaskDetails (/workspace/node_modules/nx/src/hasher/hash-task.js:19:84)
#20 9.654     at invokeTasksRunner (/workspace/node_modules/nx/src/tasks-runner/run-command.js:367:56)
#20 9.654     at runCommandForTasks (/workspace/node_modules/nx/src/tasks-runner/run-command.js:117:31)
#20 9.654     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
#20 9.654     at async /workspace/node_modules/nx/src/tasks-runner/run-command.js:105:29
#20 9.654     at async handleErrors (/workspace/node_modules/nx/src/utils/handle-errors.js:9:24)
#20 9.654     at async runCommand (/workspace/node_modules/nx/src/tasks-runner/run-command.js:104:20)
viceice commented 1 day ago

Seeing this while doing nx run-many --target build --projects 'a,b,c' "--configuration" "production" since v20.0.4

JamesHenry commented 1 day ago

(It almost certainly isn't relevant but I would clean up your lockfile - @nrwl/devkit : 16.5.1 and @nrwl/tao : 16.5.1 are VERY old, any time you see @nrwl at all from v20 onwards, and any time you see versions from @nx/@nrwl that don't match each other, it's a sign that something unexpected is being dragged in)

Cammisuli commented 1 day ago

Also for folks running 20.0.5, can you run the Nx command with NX_NATIVE_LOGGING=nx::native::cache,nx::native::db NX_DAEMON=false so that we can get more information on whats happening behind the scenes?

k3nsei commented 1 day ago

@JamesHenry I don't have any @nrwl packages in my package.json and package-lock.json

Image

k3nsei commented 1 day ago

@Cammisuli When I'm running it like that in docker results are still the same.

ENV NX_SKIP_NX_CACHE=true
ENV NX_DAEMON=false
ENV NX_DB_CACHE=false
ENV NX_INTERACTIVE=false
ENV NX_NATIVE_LOGGING=nx::native::cache,nx::native::db

# Run FE Tests
RUN cd ./workspace/ && npm ci --verbose
RUN cd ./workspace/ && npm run test:ci || true # Equivalent to `npx nx run-many --targets=test --all --configuration=ci --skip-nx-cache --verbose`
#20 [test-fe  5/10] RUN cd ./workspace/ && npm run test:ci || true
#20 9.482
#20 9.482  NX   Unable to set journal_mode: SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
#20 9.482
#20 9.482 Error: Unable to set journal_mode: SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
#20 9.482     at /workspace/node_modules/nx/src/utils/db-connection.js:11:93
#20 9.482     at getEntryOrSet (/workspace/node_modules/nx/src/utils/db-connection.js:19:17)
#20 9.482     at getDbConnection (/workspace/node_modules/nx/src/utils/db-connection.js:11:24)
#20 9.482     at getTaskDetails (/workspace/node_modules/nx/src/hasher/hash-task.js:19:84)
#20 9.482     at invokeTasksRunner (/workspace/node_modules/nx/src/tasks-runner/run-command.js:367:56)
#20 9.482     at runCommandForTasks (/workspace/node_modules/nx/src/tasks-runner/run-command.js:117:31)
#20 9.482     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
#20 9.482     at async /workspace/node_modules/nx/src/tasks-runner/run-command.js:105:29
#20 9.482     at async handleErrors (/workspace/node_modules/nx/src/utils/handle-errors.js:9:24)
#20 9.482     at async runCommand (/workspace/node_modules/nx/src/tasks-runner/run-command.js:104:20)
#20 9.482
#20 DONE 9.6s
Cammisuli commented 1 day ago

@k3nsei I don't think the env is being passed properly to the command. We should've seen logs like this:

TRACE nx::native::db: Creating connection to "/Users/jon/Dev/angular-eslint/.nx/workspace-data/44A15673-D8E6-5A58-AAFE-4CA9C1B24BCF.db"
TRACE nx::native::db::initialize: Getting lock on db lock file
TRACE nx::native::db::initialize: Got lock on db lock file
DEBUG nx::native::db::initialize: Creating table for metadata if it does not exist
TRACE nx::native::db::initialize: Checking if current existing database is compatible with Nx 20.0.5
TRACE nx::native::db::initialize: Database is compatible with Nx 20.0.5

TRACE nx::native::cache::cache: GET 16131055842124547685
TRACE nx::native::cache::cache: TIME reading terminal outputs 284.083µs
TRACE nx::native::cache::cache: GET 16131055842124547685 423.417µs
TRACE nx::native::cache::cache: GET 17221907416487066875
TRACE nx::native::cache::cache: TIME reading terminal outputs 211.708µs
TRACE nx::native::cache::cache: GET 17221907416487066875 262.125µs
TRACE nx::native::cache::cache: GET 18441477409080856066
TRACE nx::native::cache::cache: TIME reading terminal outputs 179.542µs
TRACE nx::native::cache::cache: GET 18441477409080856066 233.542µs
JamesHenry commented 1 day ago

@k3nsei Please try running npm ls @nrwl/devkit in the repo in question and see if it yields any results. If not it might be based on some outdated global install

k3nsei commented 1 day ago

@JamesHenry its not there, maybe you thinking that I'm related with @ugrave who created this issue, then no. I'm having similar problems after latest NX upgrade.

> npm ls @nrwl/devkit
my-app@0.0.0 C:\dev\my-app
└── (empty)
k3nsei commented 1 day ago

@Cammisuli Here are results, anyway it's sad that npm isn't passing env variables from system.

#20 [test-fe  5/10] RUN cd ./workspace && npm run test:ci || true
#20 0.587
#20 0.587 > my-app@0.0.0 test:ci
#20 0.587 > NX_NATIVE_LOGGING=nx::native::cache,nx::native::db npx nx run-many --targets=test --all --configuration=ci --skip-nx-cache --verbose
#20 0.587
#20 4.548 TRACE nx::native::db: Creating connection to "/workspace/.nx/workspace-data/.db"
#20 4.549 TRACE nx::native::db::initialize: Getting lock on db lock file
#20 4.549 TRACE nx::native::db::initialize: Got lock on db lock file
#20 4.549 TRACE nx::native::db::initialize: Opening connection with unix-dotfile
#20 9.567
#20 9.567  NX   Unable to set journal_mode: SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
#20 9.567
#20 9.567 Error: Unable to set journal_mode: SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
#20 9.567     at /workspace/node_modules/nx/src/utils/db-connection.js:11:93
#20 9.567     at getEntryOrSet (/workspace/node_modules/nx/src/utils/db-connection.js:19:17)
#20 9.567     at getDbConnection (/workspace/node_modules/nx/src/utils/db-connection.js:11:24)
#20 9.567     at getTaskDetails (/workspace/node_modules/nx/src/hasher/hash-task.js:19:84)
#20 9.567     at invokeTasksRunner (/workspace/node_modules/nx/src/tasks-runner/run-command.js:367:56)
#20 9.567     at runCommandForTasks (/workspace/node_modules/nx/src/tasks-runner/run-command.js:117:31)
#20 9.567     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
#20 9.567     at async /workspace/node_modules/nx/src/tasks-runner/run-command.js:105:29
#20 9.567     at async handleErrors (/workspace/node_modules/nx/src/utils/handle-errors.js:9:24)
#20 9.567     at async runCommand (/workspace/node_modules/nx/src/tasks-runner/run-command.js:104:20)
#20 9.567
#20 DONE 9.7s
ugrave commented 1 day ago

(It almost certainly isn't relevant but I would clean up your lockfile - @nrwl/devkit : 16.5.1 and @nrwl/tao : 16.5.1 are VERY old, any time you see @nrwl at all from v20 onwards, and any time you see versions from @nx/@nrwl that don't match each other, it's a sign that something unexpected is being dragged in)

They coming from some 3th party dependencies. Need to find out which one.

ugrave commented 1 day ago

I was able to get rid of the old dependencies. They came from old dependencies that were no longer needed anyway.

 NX   Report complete - copy this into the issue template

Node           : 18.20.3
OS             : linux-x64
Native Target  : x86_64-linux
npm            : 10.7.0

nx (global)        : 20.0.5
nx                 : 20.0.5
@nx/js             : 20.0.5
@nx/jest           : 20.0.5
@nx/eslint         : 20.0.5
@nx/workspace      : 20.0.5
@nx/angular        : 20.0.5
@nx/cypress        : 20.0.5
@nx/devkit         : 20.0.5
@nx/eslint-plugin  : 20.0.5
@nx/plugin         : 20.0.5
@nx/web            : 20.0.5
@nx/webpack        : 20.0.5
typescript         : 5.1.6
---------------------------------------
Community plugins:
@jsverse/transloco   : 7.5.0
@ngrx/effects        : 16.3.0
@ngrx/eslint-plugin  : 16.3.0
@ngrx/router-store   : 16.3.0
@ngrx/schematics     : 16.3.0
@ngrx/store          : 16.3.0
@ngrx/store-devtools : 16.3.0
nx-stylelint         : 17.1.6
---------------------------------------
Local workspace plugins:
         @app/workspace-plugin

The logs with the env variable set:

$ NX_NATIVE_LOGGING=nx::native::cache,nx::native::db NX_DAEMON=false  nx run-many --target=test --configuration=ci --parallel=12

 TRACE nx::native::db: Creating connection to "/build/.nx/workspace-data/a53f7908db25468b9ec0e01f8178a12f.db"
 TRACE nx::native::db::initialize: Getting lock on db lock file
 TRACE nx::native::db::initialize: Got lock on db lock file
 TRACE nx::native::db::initialize: Opening connection with unix-dotfile
 DEBUG nx::native::db::initialize: Creating table for metadata if it does not exist
 TRACE nx::native::db::initialize: Checking if current existing database is compatible with Nx 20.0.5
 TRACE nx::native::db::initialize: Database is compatible with Nx 20.0.5
 TRACE nx::native::db::connection: Database busy. Retrying
 TRACE nx::native::db::connection: Database busy. Retrying.
 TRACE nx::native::db::connection: Database busy. Retrying..
 TRACE nx::native::db::connection: Database busy. Retrying...
 TRACE nx::native::db::connection: Database busy. Retrying....

  NX   DB execute error: "INSERT OR REPLACE INTO task_details  (hash, project, target, configuration)

                     VALUES (?1, ?2, ?3, ?4)", SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
 Error: DB execute error: "INSERT OR REPLACE INTO task_details  (hash, project, target, configuration)
                     VALUES (?1, ?2, ?3, ?4)", SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
     at hashTasksThatDoNotDependOnOutputsOfOtherTasks (/build/node_modules/nx/src/hasher/hash-task.js:46:22)
     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
     at async invokeTasksRunner (/build/node_modules/nx/src/tasks-runner/run-command.js:373:5)
     at async runCommandForTasks (/build/node_modules/nx/src/tasks-runner/run-command.js:117:25)
     at async /build/node_modules/nx/src/tasks-runner/run-command.js:105:29
     at async handleErrors (/build/node_modules/nx/src/utils/handle-errors.js:9:24)
     at async runCommand (/build/node_modules/nx/src/tasks-runner/run-command.js:104:20)
     at async Object.runMany (/build/node_modules/nx/src/command-line/run-many/run-many.js:43:24)
     at async /build/node_modules/nx/src/command-line/run-many/command-object.js:13:13
     at async handleErrors (/build/node_modules/nx/src/utils/handle-errors.js:9:24)

These are the log from the other task running in parallel on the same node which did not get this error

$ nx run-many --targets=lint,stylelint,type-check --configuration=ci --parallel=3

TRACE nx::native::db: Creating connection to "/build/.nx/workspace-data/a53f7908db25468b9ec0e01f8178a12f.db"
TRACE nx::native::db::initialize: Getting lock on db lock file
TRACE nx::native::db::initialize: Got lock on db lock file
TRACE nx::native::db::initialize: Opening connection with unix-dotfile
DEBUG nx::native::db::initialize: Creating table for metadata if it does not exist
TRACE nx::native::db::initialize: Checking if current existing database is compatible with Nx 20.0.5
TRACE nx::native::db::initialize: Database is compatible with Nx 20.0.5

NX   Running targets lint, stylelint, type-check for 151 projects:

I also run cypress with multiple nx instances in parallel on the same node. Getting the same error there. I also have another build targe at the beginning which runs in parallel on 3 pprojects (with 3 nx instances). This did not have this problem.

Cammisuli commented 1 day ago

This is interesting. I can see that we retry the operation multiple times, but the database seems to be still locked after 5 tries (or in 125 ms total).

I need to find some way to get the database to lock for more than 125ms on my system. Is there a way to get this reproducible somewhere so that I can poke around the env?

Otherwise, I can probably just increase the retries with some arbitrary number. Like retry 10 times? 20? I'm not sure.

k3nsei commented 1 day ago

You can try run multistage docker container build that stages are running parallel in github workflow.

rhahne commented 16 hours ago

I have the same issue when running any nx command in the devops pipeline (locally works without any issue):

nx version: 2.0.5

> NX_NATIVE_LOGGING=nx::native::cache,nx::native::db NX_DAEMON=false nx run-many -t lint --skip-nx-cache --verbose

TRACE nx::native::db: Creating connection to "/azp/_work/1/s/.nx/workspace-data/.db"
TRACE nx::native::db::initialize: Getting lock on db lock file
TRACE nx::native::db::initialize: Got lock on db lock file
TRACE nx::native::db::initialize: Opening connection with unix-dotfile

 NX   Unable to set journal_mode: SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))

Error: Unable to set journal_mode: SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))
    at /azp/_work/1/s/node_modules/nx/src/utils/db-connection.js:11:93
    at getEntryOrSet (/azp/_work/1/s/node_modules/nx/src/utils/db-connection.js:19:17)
    at getDbConnection (/azp/_work/1/s/node_modules/nx/src/utils/db-connection.js:11:24)
    at getTaskDetails (/azp/_work/1/s/node_modules/nx/src/hasher/hash-task.js:19:84)
    at invokeTasksRunner (/azp/_work/1/s/node_modules/nx/src/tasks-runner/run-command.js:367:56)
    at runCommandForTasks (/azp/_work/1/s/node_modules/nx/src/tasks-runner/run-command.js:117:31)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /azp/_work/1/s/node_modules/nx/src/tasks-runner/run-command.js:105:29
    at async handleErrors (/azp/_work/1/s/node_modules/nx/src/utils/handle-errors.js:9:24)
    at async runCommand (/azp/_work/1/s/node_modules/nx/src/tasks-runner/run-command.js:104:20)

on nx version 2.0.3, I got the following message:

nx run-many -t lint

 NX   Running target lint for 9 projects:

....

 NX   Successfully ran target lint for 9 projects

 NX   (0 , native_1.connectToNxDb) is not a function
aram-yesildeniz commented 12 hours ago

We have the same issue in our pipeline:

nx run-many --all --target=lint --parallel=10
...
NX   Unable to set journal_mode: SqliteFailure(Error { code: DatabaseBusy, extended_code: 5 }, Some("database is locked"))