nrwl / nx

Smart Monorepos · Fast CI
https://nx.dev
MIT License
23.56k stars 2.35k forks source link

Handling CTRL+C (SIGINT) in an executor does not work anymore after upgrading NX #23585

Open pct-cclausen opened 5 months ago

pct-cclausen commented 5 months ago

Current Behavior

I have an executor in our nx monorepo which is responsible for managing "docker compose up" for a project, then running cypress tests against the docker containers and then shutdown everything.

To handle a developer pressing CTRL+C during a running test I use the "death" npm package, which installs hooks that allow the program to handle SIGINT/SIGTERM/etc and postpone program exit until these are completed.

This exists in 2 monorepos in fact:

I am upgrading packages in the customers repository. Specifically I have migrated to NX 18.2.4, as higher versions are not supported by some 3rd party packages we use.

Since this upgrade the shutdown handling does not work correctly anymore. It seems my executor code is shutdown before it can finish or even start docker compose down, which causes docker containers to stay online, which is bad for developer experience.

Expected Behavior

It should not stop the executor process prematurely. The example repo should print "WORK DONE" after ~3 seconds.

GitHub Repo

https://github.com/pct-digital/typescript-homework/tree/on_death_executor_issue

Steps to Reproduce

See the README in the repo, branch on_death_executor_issue.

Nx Report

NX   Report complete - copy this into the issue template

Node   : 20.13.1
OS     : linux-x64
npm    : 9.8.1

nx                 : 19.0.2
@nx/js             : 19.0.2
@nx/jest           : 19.0.2
@nx/linter         : 19.0.2
@nx/eslint         : 19.0.2
@nx/workspace      : 19.0.2
@nx/devkit         : 19.0.2
@nx/eslint-plugin  : 19.0.2
@nx/node           : 19.0.4
@nx/plugin         : 19.0.2
@nrwl/tao          : 19.0.2
@nx/web            : 19.0.2
@nx/webpack        : 19.0.2
typescript         : 5.4.5
---------------------------------------
Registered Plugins:
@nx/webpack/plugin
@nx/eslint/plugin
@nx/jest/plugin
---------------------------------------
Local workspace plugins:
         @nx-ts/pct-pipeline
---------------------------------------
The following packages should match the installed version of nx
  - @nx/node@19.0.4
  - @nrwl/node@19.0.4

To fix this, run `nx migrate nx@19.0.4`

Failure Logs

No response

Package Manager Version

No response

Operating System

Additional Information

I was able to "fix" the issue by manually rolling back the version of the NX package to 17.2.8 and running npm install. I am however not very happy with letting it stay like that, I think the versions should all match?

But this might be helpful for debugging the issue:

Maybe it has something to do with changes to this file: https://github.com/nrwl/nx/blame/17.3.0/packages/nx/src/tasks-runner/forked-process-task-runner.ts between 17.2.8 and 17.3.0

The only working examples of my code exist in our own private nx monorepo which I cannot share, somehow I have not been able to recreate a working minimal example, only a non-working one (see above).

pct-cclausen commented 5 months ago

I've now created a workaround that no longer depends on the complex interplay of signals and the death package. Instead I create a new process "watch for the process with pid X, if it is no longer there run command Y".

X is the pid of the running executor Y is "kill -- -Z" where Z is the pid of some external process like Cypress Y also is "docker down"

Works great for my case.

Just FYI if somebody with a similar pain finds this.

This is js file for that process:

// A helper program meant to clean up resources by running arbitrary commands
// as soon as a specific other program is not longer active

// This script is used in the acceptance-test-executor

// call like this: nde reaper.js P X1 X2 X3 X4 ...
// where P is a pid that is watched
// as soon as P can no longer be found the comamnds X1, ... XN are executed
// then reaper.js shuts down

// additionally a timeout of some hours is used, just in case.

const { execSync } = require('child_process');

const TIMEOUT = 6 * 3600 * 1000;

const targetPid = Number(process.argv[2]);
const cleanupCommands = process.argv.slice(3);

if (Number.isNaN(targetPid)) {
  throw new Error('target pid is not a number');
}

console.log('Watching process ' + targetPid + ', waiting to cleanup resources once it is gone');

function cleanup() {
  for (const cmd of cleanupCommands) {
    try {
      console.log('Cleanup... ' + cmd);
      execSync(cmd);
      console.log('...done');
    } catch (e) {
      console.log("Failed to run cleanup command '" + cmd + "'", e);
    }
  }
}

function isPidActive(pid) {
  try {
    /**
     * "As a special case, a signal of 0 can be used to test for the existence of a process."
     * So "kill" does not kill here
     */
    process.kill(targetPid, 0);
    return true;
  } catch (e) {
    return false;
  }
}

setTimeout(cleanup, TIMEOUT);

(async () => {
  for (;;) {
    await new Promise((resolve) => setTimeout(resolve, 500));

    const isActive = isPidActive(targetPid);

    if (!isActive) {
      console.log('Process ' + targetPid + ' is not active anymore. Cleaning up resources!');
      cleanup();
      process.exit(0);
    }
  }
})();