Open jwoudenberg opened 6 years ago
Turns out the server I was running wasn't shutting down because of a bug in network
: the server web server upon receiving a sigint signal got stuck closing its websocket. Upgrading network
fixed my issue.
I'll close this now. Sorry for the noise!
Great, thanks for the update!
I'm running into the same problem. Have a servant server which is forked from the main thread, and it isn't killed on file change. Tested without forking, i.e. running directly from the main thread. The same. Ports and file handles are not released.
Running with
ghcid -W -c 'stack ghci server:exe:server-bin' -T 'Main.main'
Instead of Yesod's approach which requires changing/adding some code, a simpler way might be to have an arg to ghcid
, like (bikeshedding)
--run-after-kill-and-before-reload=my-killer.sh
could do the trick?
Does this seem feasible and easy?
We could put any command in my-killer.sh
which would kill the process(es), wait longer if required, or even do some prep for starting, cleanup, etc.
Something like (I just glanced the code, so maybe another place)
sessionReload :: Session -> IO ([Load], [FilePath])
sessionReload session@Session{..} = do
-- kill anything async, set stuck if you didn't succeed
old <- modifyVar running $ \b -> return (False, b)
-- if killerPath is specified (Just), run it, regardless of `stuck`
void $ createProcess . shell <$> getKillerPathSomehow session
stuck <- if not old then return False else do
Just ghci <- readIORef ghci
fmap isNothing $ timeout 5 $ interrupt ghci
if stuck then sessionRestart session else do
...
I was using stack
and forgot this is ghci
, there's no bin filename to kill from outside, so code change needed in app. :-(
I did a little bit of testing this idea and it seems to work if the code is in sessionReload
. kill
doesn't get fired for some reason.
sessionReload :: Session -> IO ([Load], [FilePath])
sessionReload session@Session{..} = do
let killerScript = Just "ghcid-killer-script" -- hardcoded, need access to `Options`
maybe (pure ())
(\ks -> do
(_,_,_,p) <- createProcess $ shell ks
void $ waitForProcess p -- wait for shell process to finish
)
killerScript
...
Luckily, I already have the shutdown endpoint in my app, so the only thing needed was the ghcid-killer-script
:
# ghcid-killer-script
curl localhost:3003/admin/stop
Any remarks/suggestions on the approach?
@ndmitchell Any thoughts?
@vlatkoB sorry for the delay in coming back to you.
If you really need such a killer script happy to take a command line flag for something to run and cleanup at some point. However...
:!run-killer
is a perfectly legitimate command. If you do it at the start that seems simpler, and also speeds up the reload cycle and time to errors.ghcid
deliberately send Ctrl-C
to the test program. If you are responding and handling Ctrl-C properly it should do the right shutdown. Fixing that might be useful regardless.No problem, not an urgent issue, this one. :-)
I do not really need such a script, but tried to solve an issue, and came up with some kind of solution.
I'm actually trying to use ghcid
as yesod devel
, so it is not a one-time test, but a Servant
server (i.e. a never-ending process).
It responds to Ctrl-C
from shell (both stack run server-bin
and ghcid ...
), but not from inside the ghcid
. Port and file handles are still left open, and after a rebuild/reload, server "starts", but with port already in use
message.
As observed in system-monitor
, after Ctrl-C
from shell server need ~2-3 secs to disappear from pid
list, so that might be the culprit. Although, I tested local ghcid
that has timeout 15
and yet the same.
I'm not using ghci
much, so can you give me a full command line example to test? Maybe I'm not starting ghcid
correctly. What I tried still results in port already in use
and/or resource busy (file is locked)
message. Like the new server is started before the old one has the time to shutdown.
This is what I'm using.
Ghcid (with no script): ghcid -W -c 'stack ghci server' -r
or -T 'Main.main'
Script (run-killer
):
echo "Killing localhost"
curl -s localhost:3003/admin/stop
sleep 4
I imagine you writing:
ghcid -W -c 'stack ghci server' -r or -T ':!script-killer' -T 'Main.main'
Which will cause the "test" operation to first call script killer, then main
to start it up again.
It doesn't work. Seems those -T
are performed differently than from the shell. If I curl localhost:3003/admin/stop
from the shell, the server stops, ghcid
displays a message, change the code, recompiles and all good. But when the kill script is invoked with -T
, it doesn't stop. I can curl localhost:3003/hello
and it works, but for stop
getting 408.
Same happens without a kill script, when ghcid
kills server by itself, hello
works, stop
returns 408.
I'm using emacs
, and since killing from outside works, solved it with before-save-hook
. If anybody needs it, here it is:
;; Run 'stop-server' located in project's root on every manual save of Haskell file
(defvar stop-server-script-name "stop-server"
"Name of the script in project root directory." )
(defun stop-dev-server ()
"Kill server so ghcid can reload with server already stopped from outside."
(when (eq major-mode 'haskell-mode) ;; if haskell mode
(when (memq this-command '(save-buffer)) ;; if manual save
(let* ((project-root (projectile-project-root)) ;; curr project root
(script-path (concat project-root stop-server-script-name)) ;; full path
(script-exist (file-exists-p script-path)) ;; does it exist
(shell-cmd (concat script-path " %s"))) ;; make it accept param
(if script-exist
(progn (message "Stopping server with script: %s" script-path)
(shell-command-to-string (format shell-cmd buffer-file-name)))
(message "Missing script %s. No action." script-path))))))
(add-hook 'before-save-hook 'stop-dev-server)
I'll try to dig into this issue when I get more free time.
No idea why -T
doesn't work... That is odd. If you can figure that out I'd be keen to fix it.
I remember that, while I was searching/testing for the most appropriate place for the new code, the new code didn't work inside the kill
function (regardless of secs amount for timeout
), and some other places, only inside the sessionReload
(and here the new code must be the first to execute, before modifyVar running
).
Is there some parallelism going on in/before kill
? Although the new code waits for the script to finish, other things do not wait for it, so some code might get executed partially.
@vlatkoB very odd... I don't understand at all.
I was in a bit of a hurry to make it done, so I might have missed something. Will test again with a very small/basic Servant app, maybe it's something in the app I used it with.
I noticed something that might be related to this issue.
When I start ghcid
session from script
#!/bin/bash
## script for starting named ghcid session
CMD="ghcid -W -c \"stack ghci server\" -T \":main -c config/config-300$1.yaml\""
bash -c "exec -a "GHCID-RUN" $CMD"
get processes info
$ ps -o pid,ppid,sess,cmd -U vlatko | grep -v "color" | grep 22825
PID Parent Group CMD
22825 9824 22825 bash
25481 22825 22825 bash
25482 25481 22825 GHCID-RUN -W -c stack ghci server -T :main -c config/config-3005.yaml
25487 25482 22825 .stack/programs/x86_64-linux/ghc-tinfo6-8.6.4/lib/ghc-8.6.4/bin/ghc ....
and try to kill them with pkill -9 -g 25487
(or kill -9 -- -22825
, or kill -TERM 22825
, or any other weapon I could find :-) ), ghc
survives (PID is the same, it is not restarted).
$ ps -o pid,ppid,sess,cmd -U vlatko | grep -v "color" | grep 22825
PID Parent Group CMD
25487 25482 22825 .stack/programs/x86_64-linux/ghc-tinfo6-8.6.4/lib/ghc-8.6.4/bin/ghc ....
The only way to kill it is by being specific about its PID, kill -9 25487
.
EDIT: removed setsid
and adapted output
I'm noticing the same thing with Flora these past days. This is the first time I'm encountering this and I'm quite lost as to why it didn't happen before.
I'm don't think I'm particularly stressing ghcid on this one, my command-line is ghcid --target=flora-server --restart="src" --test 'FloraWeb.Server.runFlora'
.
I'm running Fedora 34.
When using nix shebangs the following works for me to overcome the issue (assuming the executable source file is ./Main.hs
):
#! /usr/bin/env nix-shell
#! nix-shell -p ghcid
#! nix-shell -p "haskellPackages.ghcWithPackages (p: with p; [net-mqtt])"
#! nix-shell -i "ghcid -c 'ghci -Wall' -T':!pkill --full ghc\\ .\\*./Main.hs' -T main"
I'm a Windows user, so don't know what is going on here. If anyone can shed some light, then we can investigate. Reopening for now though, as it does seem there is some issue underlying it.
Update: turns out an infinite loops from a websocket endpoint was somehow triggering a sequence of events that was leading to this.
The -T:!script-killer
trick does not feel convenient. Because for different apps different killer scripts are needed (or a general ghcid
children processes killer, that's awkward too).
It seems ghcid can know the process id of the spawned process, but not sure. If so, I think ghcid can be supplied with an option to automatically kill the process and its children on reload.
I'm not sure that pkill
will help at all here, since it's all happening part of the same process.
ghcid detects that a change to a module has happened, loads the new modules and calls the new test.
My current theory is that https://github.com/ndmitchell/ghcid/blob/master/src/Session.hs#L207 which is supposed to SIGINT doesn't succeed but exits cleanly (otherwise timeout should have triggered).
I made a reproducer here: https://github.com/domenkozar/ghcid-reload-repro
Basically I see this bug happens under two conditions:
1) the code doesn't respond to ctrl-c, usually because it's too busy looping or you got the handlers wrong (catching exceptions). If it's looking you can try passing -fno-emit-yields
as a ghc-options.
2) ghcid reloader doesn't reload any forked threads! my reproducer demonstrates the issue! Basically https://github.com/ndmitchell/ghcid/issues/195
@ndmitchell if reloading isn't possible by killing the threads, can we force ghcid to always restart?
@ndmitchell could we run GHC.Conc.listThreads >>= mapM_ GHC.Conc.killThread
upon reloading ghci?
Noting that I also ran into this with a GUI application yesterday.
I'm trying to use
--test Devel.main
to automatically restart a webserver. On first runs this works, but on subsequent runs starting the webserver often fails because of a port conflict. It turns out the previously started webserver process is still around, and it sticks around even after killingghcid
.Searching a bit for old issues I find some related to this problem. I'm using
ghcid
version 0.7 so I believe I should be have all existing patches.Our current build script is based on
yesod-devel
. It includes the following 'hack':From yesod-bin:
I'm not too familiar with the internals of
ghcid
so this may be way off base, but could the problemyesod-devel
circumvents here be the same plaguingghcid
?