Process started with `--test` not killed

jwoudenberg commented 6 years ago

I'm trying to use --test Devel.main to automatically restart a webserver. On first runs this works, but on subsequent runs starting the webserver often fails because of a port conflict. It turns out the previously started webserver process is still around, and it sticks around even after killing ghcid.

Searching a bit for old issues I find some related to this problem. I'm using ghcid version 0.7 so I believe I should be have all existing patches.

Our current build script is based on yesod-devel. It includes the following 'hack':

From yesod-bin:

Unfortunately, directly killing the ghc interpreter has never worked reliably, so we have an extra hack: when killing the process, yesod devel also writes to a file yesod-devel/devel-terminate. Your devel script should respect this file and shutdown whenever it exists.

I'm not too familiar with the internals of ghcid so this may be way off base, but could the problem yesod-devel circumvents here be the same plaguing ghcid?

jwoudenberg commented 6 years ago

Turns out the server I was running wasn't shutting down because of a bug in network: the server web server upon receiving a sigint signal got stuck closing its websocket. Upgrading network fixed my issue.

I'll close this now. Sorry for the noise!

ndmitchell commented 6 years ago

Great, thanks for the update!

vlatkoB commented 5 years ago

I'm running into the same problem. Have a servant server which is forked from the main thread, and it isn't killed on file change. Tested without forking, i.e. running directly from the main thread. The same. Ports and file handles are not released.

Running with ghcid -W -c 'stack ghci server:exe:server-bin' -T 'Main.main'

Instead of Yesod's approach which requires changing/adding some code, a simpler way might be to have an arg to ghcid, like (bikeshedding) --run-after-kill-and-before-reload=my-killer.sh could do the trick? Does this seem feasible and easy?

We could put any command in my-killer.sh which would kill the process(es), wait longer if required, or even do some prep for starting, cleanup, etc.

vlatkoB commented 5 years ago

Something like (I just glanced the code, so maybe another place)

sessionReload :: Session -> IO ([Load], [FilePath])
sessionReload session@Session{..} = do
    -- kill anything async, set stuck if you didn't succeed
    old <- modifyVar running $ \b -> return (False, b)

    -- if killerPath is specified (Just), run it, regardless of `stuck`
    void $ createProcess . shell <$> getKillerPathSomehow session

    stuck <- if not old then return False else do
        Just ghci <- readIORef ghci
        fmap isNothing $ timeout 5 $ interrupt ghci

    if stuck then sessionRestart session else do
      ...

vlatkoB commented 5 years ago

I was using stack and forgot this is ghci, there's no bin filename to kill from outside, so code change needed in app. :-( I did a little bit of testing this idea and it seems to work if the code is in sessionReload. kill doesn't get fired for some reason.

sessionReload :: Session -> IO ([Load], [FilePath])
sessionReload session@Session{..} = do
    let killerScript = Just "ghcid-killer-script" -- hardcoded, need access to `Options`
    maybe (pure ())
          (\ks -> do 
             (_,_,_,p) <- createProcess $ shell ks
             void $ waitForProcess p              -- wait for shell process to finish
          )
          killerScript
    ...

Luckily, I already have the shutdown endpoint in my app, so the only thing needed was the ghcid-killer-script:

# ghcid-killer-script
curl localhost:3003/admin/stop

Any remarks/suggestions on the approach?

vlatkoB commented 5 years ago

If anybody wants to test, here it is.

vlatkoB commented 5 years ago

@ndmitchell Any thoughts?

ndmitchell commented 5 years ago

@vlatkoB sorry for the delay in coming back to you.

If you really need such a killer script happy to take a command line flag for something to run and cleanup at some point. However...

Why not run this cleanup script at the start of the test? The test is a GHCi command, so :!run-killer is a perfectly legitimate command. If you do it at the start that seems simpler, and also speeds up the reload cycle and time to errors.
ghcid deliberately send Ctrl-C to the test program. If you are responding and handling Ctrl-C properly it should do the right shutdown. Fixing that might be useful regardless.

vlatkoB commented 5 years ago

No problem, not an urgent issue, this one. :-) I do not really need such a script, but tried to solve an issue, and came up with some kind of solution. I'm actually trying to use ghcid as yesod devel, so it is not a one-time test, but a Servant server (i.e. a never-ending process).

It responds to Ctrl-C from shell (both stack run server-bin and ghcid ...), but not from inside the ghcid. Port and file handles are still left open, and after a rebuild/reload, server "starts", but with port already in use message. As observed in system-monitor, after Ctrl-C from shell server need ~2-3 secs to disappear from pid list, so that might be the culprit. Although, I tested local ghcid that has timeout 15 and yet the same.
I'm not using ghci much, so can you give me a full command line example to test? Maybe I'm not starting ghcid correctly. What I tried still results in port already in use and/or resource busy (file is locked)message. Like the new server is started before the old one has the time to shutdown.

This is what I'm using. Ghcid (with no script): ghcid -W -c 'stack ghci server' -r or -T 'Main.main' Script (run-killer):

echo "Killing localhost"
curl -s localhost:3003/admin/stop
sleep 4

ndmitchell commented 5 years ago

I imagine you writing:

ghcid -W -c 'stack ghci server' -r or -T ':!script-killer' -T 'Main.main'

Which will cause the "test" operation to first call script killer, then main to start it up again.

vlatkoB commented 5 years ago

It doesn't work. Seems those -T are performed differently than from the shell. If I curl localhost:3003/admin/stop from the shell, the server stops, ghcid displays a message, change the code, recompiles and all good. But when the kill script is invoked with -T, it doesn't stop. I can curl localhost:3003/hello and it works, but for stop getting 408. Same happens without a kill script, when ghcid kills server by itself, hello works, stop returns 408.

vlatkoB commented 5 years ago

I'm using emacs, and since killing from outside works, solved it with before-save-hook. If anybody needs it, here it is:

;; Run 'stop-server' located in project's root on every manual save of Haskell file
(defvar stop-server-script-name "stop-server"
  "Name of the script in project root directory." )
(defun stop-dev-server ()
  "Kill server so ghcid can reload with server already stopped from outside."
  (when (eq major-mode 'haskell-mode)                                     ;; if haskell mode
    (when (memq this-command '(save-buffer))                              ;; if manual save
      (let* ((project-root (projectile-project-root))                     ;; curr project root
             (script-path  (concat project-root stop-server-script-name)) ;; full path
             (script-exist (file-exists-p script-path))                   ;; does it exist
             (shell-cmd    (concat script-path " %s")))                   ;; make it accept param
        (if script-exist
            (progn (message "Stopping server with script: %s" script-path)
                   (shell-command-to-string (format shell-cmd buffer-file-name)))
          (message "Missing script %s. No action." script-path))))))
(add-hook 'before-save-hook 'stop-dev-server)

I'll try to dig into this issue when I get more free time.

ndmitchell commented 5 years ago

No idea why -T doesn't work... That is odd. If you can figure that out I'd be keen to fix it.

vlatkoB commented 5 years ago

I remember that, while I was searching/testing for the most appropriate place for the new code, the new code didn't work inside the kill function (regardless of secs amount for timeout), and some other places, only inside the sessionReload (and here the new code must be the first to execute, before modifyVar running). Is there some parallelism going on in/before kill? Although the new code waits for the script to finish, other things do not wait for it, so some code might get executed partially.

ndmitchell commented 5 years ago

@vlatkoB very odd... I don't understand at all.

vlatkoB commented 5 years ago

I was in a bit of a hurry to make it done, so I might have missed something. Will test again with a very small/basic Servant app, maybe it's something in the app I used it with.

vlatkoB commented 5 years ago

I noticed something that might be related to this issue. When I start ghcid session from script

#!/bin/bash

## script for starting named ghcid session
CMD="ghcid -W -c \"stack ghci server\" -T \":main -c config/config-300$1.yaml\""
bash -c "exec -a "GHCID-RUN" $CMD"

get processes info

$ ps -o pid,ppid,sess,cmd -U vlatko | grep -v "color" | grep 22825
PID   Parent Group CMD
22825  9824 22825 bash
25481 22825 22825 bash
25482 25481 22825 GHCID-RUN -W -c stack ghci server -T :main -c config/config-3005.yaml
25487 25482 22825  .stack/programs/x86_64-linux/ghc-tinfo6-8.6.4/lib/ghc-8.6.4/bin/ghc ....

and try to kill them with pkill -9 -g 25487 (or kill -9 -- -22825, or kill -TERM 22825, or any other weapon I could find :-) ), ghc survives (PID is the same, it is not restarted).

$ ps -o pid,ppid,sess,cmd -U vlatko | grep -v "color" | grep 22825
PID   Parent Group CMD
25487 25482 22825  .stack/programs/x86_64-linux/ghc-tinfo6-8.6.4/lib/ghc-8.6.4/bin/ghc ....

The only way to kill it is by being specific about its PID, kill -9 25487.

EDIT: removed setsid and adapted output

tchoutri commented 2 years ago

I'm noticing the same thing with Flora these past days. This is the first time I'm encountering this and I'm quite lost as to why it didn't happen before.

I'm don't think I'm particularly stressing ghcid on this one, my command-line is ghcid --target=flora-server --restart="src" --test 'FloraWeb.Server.runFlora'.

I'm running Fedora 34.

AleXoundOS commented 2 years ago

When using nix shebangs the following works for me to overcome the issue (assuming the executable source file is ./Main.hs):

#! /usr/bin/env nix-shell
#! nix-shell -p ghcid
#! nix-shell -p "haskellPackages.ghcWithPackages (p: with p; [net-mqtt])"
#! nix-shell -i "ghcid -c 'ghci -Wall' -T':!pkill --full ghc\\ .\\*./Main.hs' -T main"

ndmitchell commented 2 years ago

I'm a Windows user, so don't know what is going on here. If anyone can shed some light, then we can investigate. Reopening for now though, as it does seem there is some issue underlying it.

tchoutri commented 2 years ago

Update: turns out an infinite loops from a websocket endpoint was somehow triggering a sequence of events that was leading to this.

AleXoundOS commented 2 years ago

The -T:!script-killer trick does not feel convenient. Because for different apps different killer scripts are needed (or a general ghcid children processes killer, that's awkward too). It seems ghcid can know the process id of the spawned process, but not sure. If so, I think ghcid can be supplied with an option to automatically kill the process and its children on reload.

domenkozar commented 8 months ago

I'm not sure that pkill will help at all here, since it's all happening part of the same process.

ghcid detects that a change to a module has happened, loads the new modules and calls the new test.

domenkozar commented 8 months ago

My current theory is that https://github.com/ndmitchell/ghcid/blob/master/src/Session.hs#L207 which is supposed to SIGINT doesn't succeed but exits cleanly (otherwise timeout should have triggered).

domenkozar commented 8 months ago

I made a reproducer here: https://github.com/domenkozar/ghcid-reload-repro

Basically I see this bug happens under two conditions:

1) the code doesn't respond to ctrl-c, usually because it's too busy looping or you got the handlers wrong (catching exceptions). If it's looking you can try passing -fno-emit-yields as a ghc-options. 2) ghcid reloader doesn't reload any forked threads! my reproducer demonstrates the issue! Basically https://github.com/ndmitchell/ghcid/issues/195

@ndmitchell if reloading isn't possible by killing the threads, can we force ghcid to always restart?

domenkozar commented 8 months ago

@ndmitchell could we run GHC.Conc.listThreads >>= mapM_ GHC.Conc.killThread upon reloading ghci?

alt-romes commented 4 months ago

Noting that I also ran into this with a GUI application yesterday.

ndmitchell / ghcid

Process started with `--test` not killed #191