Open lpsinger opened 1 year ago
I tried to remove all Remix code for killing the app server process, but my system was still killing the app server when remix dev
was CTRL-C'd. I tried removing any process.on
calls for SIGINT
and SIGTERM
, clearing out the body of any exitHook
s, and setting cleaup: false
for execa
.
The only way the app server process stayed alive is when I set detached: true
for the subprocess, but detached subprocesses are another can of worms and their behavior differs across OSs. Might revisit this later, but if anyone has more insight into what can be done here, would be appreciated.
@pcattori, I think that part of the problem is that arc sandbox itself doesn't handle signals. I opened the PR https://github.com/architect/sandbox/pull/729 to try to address that. Would you please see if your changes on the Remix side work when combined with that PR?
I did some testing on this issue as I could see it as well. I was able to change my docker to use an entrypoint. In my entrypoint I created a signal trap. I was able to trap all signals. This told me that sending signals from outside of docker into docker was working.
I then ensured that I was using exec in the entrypoint so that the second process after docker-init would be npm run dev
. At this point it stopped responding to SIGINT and SIGTERM signals. SIGHUP and SIGKILL continued to function. This told me that the issue was possibly with the npm run process itself. I used strace to trace the npm run dev
process and I never registered a sigint or sigterm, it was always just instant death.
I went digging and I have found this issue and this PR that I believe may be the cause (or a partial cause) of this issue.
I have updated my npm version to be the most recent version (10.3.0) and I was able to see it close the process in under a second when I use control + c, so I believe this should fix it.
I tried to remove all Remix code for killing the app server process, but my system was still killing the app server when
remix dev
was CTRL-C'd. I tried removing anyprocess.on
calls forSIGINT
andSIGTERM
, clearing out the body of anyexitHook
s, and settingcleaup: false
forexeca
.The only way the app server process stayed alive is when I set
detached: true
for the subprocess, but detached subprocesses are another can of worms and their behavior differs across OSs. Might revisit this later, but if anyone has more insight into what can be done here, would be appreciated.
@pcattori could you please ensure your npm version is 10.3.0 and try your experiment again? I am hopeful this is your issue.
I spoke to @lpsinger and this did not impact the issue he was having. I want to be more specific about what I was seeing and what was fixed with the npm version change.
In the buggy version, you send a sigint/term to docker (depending on what you have configured and if you use compose). NPM was catching these but doing nothing with them. So as you tried to close the npm run process, it would hang for 10 seconds till it hit the Docker timeout and docker sends a sigkill. So I was unable to get to the remix layer at all because the signal was being trapped before that.
All the npm version change does is make it so that npm isn't just trapping and keeping the signal. So now the signals get passed along.
It does not however show any additional logging or proof of graceful shutdown in remix.
I can reliably "gracefully" terminate the child process now, but I have done terrible things to get there. I am hoping though that this at least helps point in the right direction and maybe one of you will advise further.
in in the devServer_unstable/index file, if I replace exitHook__default["default"](cleanup);
with:
state.appServer.on("exit", ()=>{
process.exit(0)
})
process.on("SIGINT", ()=>{
cleanup()
})
and then in to the sandbox/src/cli/index file in architect's sandbox I ALSO add this:
process.on("exit", (signal)=>{
sandbox.end()
})
It will work as expected, but not how I would have thought it would. I added an exit event for the child process so that basically when it exits I then can exit the main process. I did this because handling asynchronous shutdown actions was a beast and this way I'm listening on the main process for the child process to reach exit status which then triggers the exit of the main process. Like I said, super duper ugly, but maybe illuminating?
a maybe odd thing: If I run the process and then I take the PID for the arc process and kill -2 it, everything terminates as expected within arc and the remix process carries on as expected. I put logs in where sdtin/stdout are piped and so I could see what was passed through. When I kill -2 the process, something was passed through STDOUT 3 times and arc received the exit event twice. I would have expected to only get an exit signal once and I would not have expected anything to come through stdout much less 3 times. I'm not sure what it was as it didn't print so I am assuming it's signal. It could be normal but it seemed a bit odd to me. When I control + c it's just the signal and nothing goes through STDOUT (which is expected).
@Courey Thanks for diving deep on this! π
We've been heads down on getting Vite stuff stabilized. We're going to need to clean up some of the CLI code soon anyway, so will try to include some/all of this there. In the meantime, would love any arc experts to help me figure out how best to interop with Vite dev server since that's what we're moving to anyway. I think we can use sandbox
to spin up a Vite dev server in middleware mode, and then delegate asset requets and route requests to it within the arc request handler. Conceptually, that all checks out to me, but haven't tried getting it running for real. One piece to figure out is that the Vite middleware is Express-style middleware. So there's a chance we may need to adapt the web fetch standard Request
/Response
that both arc and Remix prefer to Node-style (i.e. Express-style) IncomingRequest
, ServerResponse
.
I created this issue with Arc because I was able to isolate the process and send it signals manually and did not get the expected response from arc sandbox, so I thought they might be able to tell me more about that and that might point me in the right direction. That was a red herring. They believe the issue lies with execa and how it handles children of children. They seem extremely helpful and friendly, and I know in your comment you said:
In the meantime, would love any arc experts to help me figure out how best to interop with Vite dev server since that's what we're moving to anyway.
I was hoping to introduce y'all to each other because it sounds like it could be a useful pairing.
What version of Remix are you using?
1.18.1
Are all your remix dependencies & dev-dependencies using the same version?
Steps to Reproduce
When you are using dev_v2 mode in an Architect app, when you interrupt
npm run dev
with ^C, it kills all subprocesses with extreme prejudice and does not permit Architect plugins to clean up.To reproduce:
start
method (which would normally start a sidecar process --- such as an OpenSearch server) and astop
method (which would normally stop the sidecar process and delete any temporary directories). For example:plugin-sandbox.js
app.arc
npm run dev
. Wait until it's up and running.Expected Behavior
The dev server should come down cleanly and you should see both of the log messages from the Architect sandbox plugin, both
starting example sandbox plugin
andstopping example sandbox plugin
. Here's the good output withdev_v2: false
set. For a reproducer, see https://github.com/lpsinger/remix-bug-report-arc-sandbox-killed-ungracefully/tree/dev_v1.Actual Behavior
Here's the bad output. You see
starting example sandbox plugin
, but you never seestopping example sandbox plugin
. For a reproducer, see https://github.com/lpsinger/remix-bug-report-arc-sandbox-killed-ungracefully/tree/dev_v2.