Open pct-cclausen opened 5 months ago
I've now created a workaround that no longer depends on the complex interplay of signals and the death package. Instead I create a new process "watch for the process with pid X, if it is no longer there run command Y".
X is the pid of the running executor Y is "kill -- -Z" where Z is the pid of some external process like Cypress Y also is "docker down"
Works great for my case.
Just FYI if somebody with a similar pain finds this.
This is js file for that process:
// A helper program meant to clean up resources by running arbitrary commands
// as soon as a specific other program is not longer active
// This script is used in the acceptance-test-executor
// call like this: nde reaper.js P X1 X2 X3 X4 ...
// where P is a pid that is watched
// as soon as P can no longer be found the comamnds X1, ... XN are executed
// then reaper.js shuts down
// additionally a timeout of some hours is used, just in case.
const { execSync } = require('child_process');
const TIMEOUT = 6 * 3600 * 1000;
const targetPid = Number(process.argv[2]);
const cleanupCommands = process.argv.slice(3);
if (Number.isNaN(targetPid)) {
throw new Error('target pid is not a number');
}
console.log('Watching process ' + targetPid + ', waiting to cleanup resources once it is gone');
function cleanup() {
for (const cmd of cleanupCommands) {
try {
console.log('Cleanup... ' + cmd);
execSync(cmd);
console.log('...done');
} catch (e) {
console.log("Failed to run cleanup command '" + cmd + "'", e);
}
}
}
function isPidActive(pid) {
try {
/**
* "As a special case, a signal of 0 can be used to test for the existence of a process."
* So "kill" does not kill here
*/
process.kill(targetPid, 0);
return true;
} catch (e) {
return false;
}
}
setTimeout(cleanup, TIMEOUT);
(async () => {
for (;;) {
await new Promise((resolve) => setTimeout(resolve, 500));
const isActive = isPidActive(targetPid);
if (!isActive) {
console.log('Process ' + targetPid + ' is not active anymore. Cleaning up resources!');
cleanup();
process.exit(0);
}
}
})();
Current Behavior
I have an executor in our nx monorepo which is responsible for managing "docker compose up" for a project, then running cypress tests against the docker containers and then shutdown everything.
To handle a developer pressing CTRL+C during a running test I use the "death" npm package, which installs hooks that allow the program to handle SIGINT/SIGTERM/etc and postpone program exit until these are completed.
This exists in 2 monorepos in fact:
I am upgrading packages in the customers repository. Specifically I have migrated to NX 18.2.4, as higher versions are not supported by some 3rd party packages we use.
Since this upgrade the shutdown handling does not work correctly anymore. It seems my executor code is shutdown before it can finish or even start
docker compose down
, which causes docker containers to stay online, which is bad for developer experience.Expected Behavior
It should not stop the executor process prematurely. The example repo should print "WORK DONE" after ~3 seconds.
GitHub Repo
https://github.com/pct-digital/typescript-homework/tree/on_death_executor_issue
Steps to Reproduce
See the README in the repo, branch on_death_executor_issue.
Nx Report
Failure Logs
No response
Package Manager Version
No response
Operating System
Additional Information
I was able to "fix" the issue by manually rolling back the version of the NX package to 17.2.8 and running npm install. I am however not very happy with letting it stay like that, I think the versions should all match?
But this might be helpful for debugging the issue:
Maybe it has something to do with changes to this file: https://github.com/nrwl/nx/blame/17.3.0/packages/nx/src/tasks-runner/forked-process-task-runner.ts between 17.2.8 and 17.3.0
The only working examples of my code exist in our own private nx monorepo which I cannot share, somehow I have not been able to recreate a working minimal example, only a non-working one (see above).