yesodweb / wai

Haskell Web Application Interface
MIT License
834 stars 263 forks source link

auto-update: mkDebounce leaking the thread #980

Open Yuras opened 7 months ago

Yuras commented 7 months ago

mkDebounce internally forks a thread, but never kills it. To reproduce, run the following with +RTS -s -RTS and observe memory usage.

import Control.Debounce
import Control.Monad

main :: IO ()
main = forM_ [(0::Int) .. 100000] $ \_ -> do
  r <- mkDebounce defaultDebounceSettings
  r

Output:

     250,129,600 bytes allocated in the heap
     417,814,184 bytes copied during GC
      47,018,288 bytes maximum residency (8 sample(s))
       3,325,648 bytes maximum slop
             132 MiB total memory in use (0 MB lost due to fragmentation)

132MiB memory is too much for the above program.

As a possible solution, consider attaching a finalizer to the debounced action that will kill the thread. I'd be happy to prepare a patch if you agree with such a solution. Other possibility is to use a closable channel instead of MVar. I didn't try it though.

A bit of context: fast-logger uses mkDebounce to flush logs, so each logger comes with a thread which never exits.

Thanks!

Vlix commented 5 months ago

I wonder how this could be fixed.

I guess mkDebounce could also provide a finalizer?

Or create a function that captures its own MVar and doesn't need to fork a thread?

mkDebounce :: IO () -> IO (IO ())
mkDebounce action = do
  baton <- newMVar ()
  pure $ do
    mBaton <- tryTakeMVar baton
    case mBaton of
      Nothing -> pure ()
      Just () -> do
        -- handle Leading/Trailing
        -- and `putMVar ()` when done with action + delay

@kazu-yamamoto would this idea be an ok replacement for Control.Debounce.mkDebounce? Or do you see any issues with it? (a finally putMVar might also be needed, come to think of it)

kazu-yamamoto commented 5 months ago

This module was created by @snoyberg. I don't have strong opinions but yes, cleaning up threads is a MUST.

Vlix commented 2 weeks ago

Will try to reproduce with my PR #996 at the earliest convenience.

kazu-yamamoto commented 2 weeks ago

Note that the recent GHC has GHC.Conc.Sync.listThreads. Using it with lableThreads, we can check if the clean up works well. Now "label" can be passwd to auto-update stuffs via fooThreadName.

Vlix commented 2 weeks ago

I'll add listThreads to my reproduction. I don't know what you mean with "passing 'label' to 'auto-update stuffs'", though :thinking:

kazu-yamamoto commented 2 weeks ago

See updateThreadName.

Yuras commented 2 weeks ago

There is a high change that the issue is actually caused by labelThread, see https://gitlab.haskell.org/ghc/ghc/-/issues/23949

The reproducer in the issue description is probably wrong, it's not letting the threads chance to receive BlockedIndefinitelyOnMVar. There should be threadDelay and performMajorGC somewhere in the loop.

Vlix commented 2 weeks ago

There is a high change that the issue is actually caused by labelThread, see https://gitlab.haskell.org/ghc/ghc/-/issues/23949

The reproducer in the issue description is probably wrong, it's not letting the threads chance to receive BlockedIndefinitelyOnMVar. There should be threadDelay and performMajorGC somewhere in the loop.

labelThread was not added in the debouncing function(s) until auto-update-0.2.2, which was released about a month ago, so unless the labelThread issue was always a problem for literally every fork, even if the labelThread wasn't called. This would still be an issue.

Vlix commented 1 week ago

I've ran OP's example on master with GHC 9.6.6 and GHC 9.2.8 and found no significant difference. (They were both always above 100MB total memory in use, and would also sometimes result in the 130~ish MB the OP showed.)

And I've ran it with my PR #996 on GHC 9.6.6, which used a lot less memory, and also found that when I added a print statement in the debounce action, the printing would stagger and lag with the current code, but with my PR it was a lot smoother and just flooded my screen as fast as possible, instead of having hiccups and freezes.

Average results from trying the OP's example ``` ------------------------------- `master` BRANCH GHC 9.6.6 ------------------------------- 132,844,528 bytes allocated in the heap 354,407,584 bytes copied during GC 37,886,632 bytes maximum residency (6 sample(s)) 4,250,408 bytes maximum slop 116 MiB total memory in use (0 MiB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 29 colls, 0 par 0.061s 0.061s 0.0021s 0.0033s Gen 1 6 colls, 0 par 0.071s 0.071s 0.0118s 0.0166s INIT time 0.001s ( 0.000s elapsed) MUT time 9.728s ( 10.614s elapsed) GC time 0.132s ( 0.132s elapsed) EXIT time 0.007s ( 0.014s elapsed) Total time 9.868s ( 10.760s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 13,655,472 bytes per MUT second Productivity 98.6% of total user, 98.6% of total elapsed ------------------------------- `master` BRANCH GHC 9.2.8 ------------------------------- 253,499,096 bytes allocated in the heap 378,401,504 bytes copied during GC 37,405,288 bytes maximum residency (8 sample(s)) 4,423,064 bytes maximum slop 113 MiB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 52 colls, 0 par 0.064s 0.064s 0.0012s 0.0028s Gen 1 8 colls, 0 par 0.078s 0.078s 0.0098s 0.0149s INIT time 0.000s ( 0.000s elapsed) MUT time 9.738s ( 9.738s elapsed) GC time 0.143s ( 0.143s elapsed) EXIT time 0.266s ( 0.266s elapsed) Total time 10.148s ( 10.148s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 26,031,989 bytes per MUT second Productivity 96.0% of total user, 96.0% of total elapsed ------------------------------- NEW (#996) DEBOUNCE GHC 9.6.6 ------------------------------- 641,195,544 bytes allocated in the heap 433,286,048 bytes copied during GC 14,834,760 bytes maximum residency (15 sample(s)) 2,876,344 bytes maximum slop 48 MiB total memory in use (0 MiB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 146 colls, 0 par 0.063s 0.063s 0.0004s 0.0015s Gen 1 15 colls, 0 par 0.064s 0.064s 0.0043s 0.0062s INIT time 0.001s ( 0.001s elapsed) MUT time 9.412s ( 9.389s elapsed) GC time 0.127s ( 0.127s elapsed) EXIT time 0.278s ( 0.284s elapsed) Total time 9.817s ( 9.801s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 68,126,982 bytes per MUT second Productivity 95.9% of total user, 95.8% of total elapsed ```
Vlix commented 1 week ago

Now some tests with listThreads.

The master branch kept about 57000 threads active if the debounceAction was only pure (). It kept 47000 active if the debounceAction had a threadDelay 1_000_000 and this goes down the more the threadDelay increases... not entirely sure why. (See Addendum 1)

The #996 way of doing debouncing still has 11000~ish threads open after all that, and this goes down by a bit with a 1s threadDelay as the debounceAction, but somehow the amount of threads open at the end increases again if the debounceAction is set to 2s threadDelay? :thinking: (See Addendum 2)

This might have to do with the GC taking longer to recognize the threads can be cleaned up? Or something like that? I honestly don't have too much knowledge about the GHC runtime to know how it cleans up green threads, and how accurate listThreads is. But all in all, #996 does seem like an improvement :shrug:

Source code that was run: ```hs {-# LANGUAGE NumericUnderscores #-} import Control.Concurrent (threadDelay) import Control.Monad (forM_) import GHC.Conc (listThreads) import System.Environment (getArgs) import System.Mem (performMajorGC) import Control.Debounce ( debounceAction, defaultDebounceSettings, mkDebounce, ) main :: IO () main = do args <- getArgs let action = case args of [] -> pure () "0" : _ -> pure () i : _ -> threadDelay $ read i * 1_000_000 forM_ [(0::Int) .. 100_000] $ \_ -> do r <- mkDebounce defaultDebounceSettings{debounceAction = action} r nowThreads <- length <$> listThreads threadDelay 5_000_000 laterThreads <- length <$> listThreads threadDelay 1_000_000 performMajorGC gcThreads <- length <$> listThreads threadDelay 1_000_000 putStr "\nThreads amount right after: " >> print nowThreads putStr "\nThreads amount 5 sec later: " >> print laterThreads putStr "\nThreads amount after GC: " >> print gcThreads putStrLn "" ```
Addendum 1 ``` --------------------------- `master` BRANCH GHC 9.6.6 (with `listThreads`) --------------------------- Threads amount right after: 57671 Threads amount 5 sec later: 57671 Threads amount after GC: 57671 149,547,936 bytes allocated in the heap 421,271,680 bytes copied during GC 64,671,328 bytes maximum residency (7 sample(s)) 4,768,160 bytes maximum slop 144 MiB total memory in use (0 MiB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 32 colls, 0 par 0.066s 0.066s 0.0021s 0.0039s Gen 1 7 colls, 0 par 0.100s 0.100s 0.0143s 0.0258s INIT time 0.000s ( 0.000s elapsed) MUT time 9.709s ( 16.607s elapsed) GC time 0.166s ( 0.167s elapsed) EXIT time 0.005s ( 0.006s elapsed) Total time 9.881s ( 16.780s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 15,402,472 bytes per MUT second Productivity 98.3% of total user, 99.0% of total elapsed --------------------------- `master` BRANCH GHC 9.6.6 (with `listThreads`) ** 1 second `threadDelay` ** --------------------------- Threads amount right after: 47431 Threads amount 5 sec later: 47431 Threads amount after GC: 47431 493,958,480 bytes allocated in the heap 452,738,768 bytes copied during GC 53,122,272 bytes maximum residency (10 sample(s)) 3,930,912 bytes maximum slop 121 MiB total memory in use (0 MiB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 113 colls, 0 par 0.078s 0.078s 0.0007s 0.0041s Gen 1 10 colls, 0 par 0.103s 0.103s 0.0103s 0.0227s INIT time 0.000s ( 0.000s elapsed) MUT time 20.493s ( 26.422s elapsed) GC time 0.181s ( 0.181s elapsed) EXIT time 0.003s ( 0.007s elapsed) Total time 20.677s ( 26.610s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 24,103,976 bytes per MUT second Productivity 99.1% of total user, 99.3% of total elapsed --------------------------- `master` BRANCH GHC 9.6.6 (with `listThreads`) ** 2 second `threadDelay` ** --------------------------- Threads amount right after: 38926 Threads amount 5 sec later: 38926 Threads amount after GC: 38926 491,857,688 bytes allocated in the heap 455,557,856 bytes copied during GC 42,842,008 bytes maximum residency (10 sample(s)) 4,241,224 bytes maximum slop 125 MiB total memory in use (0 MiB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 112 colls, 0 par 0.085s 0.085s 0.0008s 0.0029s Gen 1 10 colls, 0 par 0.116s 0.116s 0.0116s 0.0220s INIT time 0.000s ( 0.000s elapsed) MUT time 26.134s ( 31.982s elapsed) GC time 0.201s ( 0.201s elapsed) EXIT time 0.002s ( 0.007s elapsed) Total time 26.337s ( 32.190s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 18,820,478 bytes per MUT second Productivity 99.2% of total user, 99.4% of total elapsed ```
Addendum 2 ``` --------------------------- NEW (#996) BRANCH GHC 9.6.6 (with `listThreads`) --------------------------- Threads amount right after: 11371 Threads amount 5 sec later: 11371 Threads amount after GC: 11371 363,695,808 bytes allocated in the heap 362,168,344 bytes copied during GC 14,189,632 bytes maximum residency (15 sample(s)) 2,079,680 bytes maximum slop 46 MiB total memory in use (0 MiB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 80 colls, 0 par 0.050s 0.050s 0.0006s 0.0023s Gen 1 15 colls, 0 par 0.061s 0.061s 0.0040s 0.0062s INIT time 0.000s ( 0.000s elapsed) MUT time 9.509s ( 16.404s elapsed) GC time 0.111s ( 0.111s elapsed) EXIT time 0.000s ( 0.005s elapsed) Total time 9.620s ( 16.520s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 38,248,628 bytes per MUT second Productivity 98.8% of total user, 99.3% of total elapsed --------------------------- NEW (#996) BRANCH GHC 9.6.6 (with `listThreads`) ** 1 second `threadDelay` ** --------------------------- Threads amount right after: 11068 Threads amount 5 sec later: 11068 Threads amount after GC: 11068 710,811,992 bytes allocated in the heap 385,014,272 bytes copied during GC 13,558,344 bytes maximum residency (17 sample(s)) 1,990,072 bytes maximum slop 45 MiB total memory in use (0 MiB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 161 colls, 0 par 0.055s 0.055s 0.0003s 0.0016s Gen 1 17 colls, 0 par 0.065s 0.065s 0.0038s 0.0064s INIT time 0.001s ( 0.001s elapsed) MUT time 19.795s ( 26.229s elapsed) GC time 0.120s ( 0.120s elapsed) EXIT time 0.000s ( 0.001s elapsed) Total time 19.916s ( 26.350s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 35,907,809 bytes per MUT second Productivity 99.4% of total user, 99.5% of total elapsed --------------------------- NEW (#996) BRANCH GHC 9.6.6 (with `listThreads`) ** 2 second `threadDelay` ** --------------------------- Threads amount right after: 18413 Threads amount 5 sec later: 18413 Threads amount after GC: 18413 712,927,520 bytes allocated in the heap 396,490,936 bytes copied during GC 20,182,664 bytes maximum residency (16 sample(s)) 2,793,136 bytes maximum slop 59 MiB total memory in use (0 MiB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 163 colls, 0 par 0.065s 0.066s 0.0004s 0.0034s Gen 1 16 colls, 0 par 0.072s 0.072s 0.0045s 0.0085s INIT time 0.000s ( 0.000s elapsed) MUT time 25.060s ( 31.797s elapsed) GC time 0.138s ( 0.138s elapsed) EXIT time 0.001s ( 0.005s elapsed) Total time 25.198s ( 31.940s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 28,449,329 bytes per MUT second Productivity 99.4% of total user, 99.6% of total elapsed ```
Vlix commented 1 week ago

Ah, after adding some BangPatterns in front of the results from listThread (like !nowThreads <-), this is the result with #996 (debounceAction is pure ()):

#996 Executable ``` Threads amount right after: 12638 Threads amount 5 sec later: 12638 Threads amount after GC: 1 362,847,320 bytes allocated in the heap 350,826,880 bytes copied during GC 14,189,632 bytes maximum residency (15 sample(s)) 2,079,680 bytes maximum slop 45 MiB total memory in use (0 MiB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 80 colls, 0 par 0.049s 0.049s 0.0006s 0.0022s Gen 1 15 colls, 0 par 0.057s 0.057s 0.0038s 0.0060s INIT time 0.001s ( 0.000s elapsed) MUT time 9.428s ( 16.301s elapsed) GC time 0.106s ( 0.106s elapsed) EXIT time 0.000s ( 0.003s elapsed) Total time 9.534s ( 16.411s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 38,487,966 bytes per MUT second Productivity 98.9% of total user, 99.3% of total elapsed ```

And this is the result on the master thread:

master Executable ``` Threads amount right after: 57749 Threads amount 5 sec later: 57749 Threads amount after GC: 34841 148,134,400 bytes allocated in the heap 425,123,096 bytes copied during GC 64,377,464 bytes maximum residency (8 sample(s)) 6,040,968 bytes maximum slop 145 MiB total memory in use (0 MiB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 31 colls, 0 par 0.066s 0.066s 0.0021s 0.0035s Gen 1 8 colls, 0 par 0.115s 0.116s 0.0145s 0.0242s INIT time 0.001s ( 0.001s elapsed) MUT time 9.660s ( 16.566s elapsed) GC time 0.181s ( 0.181s elapsed) EXIT time 0.001s ( 0.002s elapsed) Total time 9.843s ( 16.750s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 15,335,172 bytes per MUT second Productivity 98.1% of total user, 98.9% of total elapsed ```

So it does seem the new way of debouncing actually cleans up the threads, whereas the current one leaks at least somewhat?


Some extra testing

1 sec leaves less:

`master` branch 1s `threadDelay` action ``` Threads amount right after: 46215 Threads amount 5 sec later: 46215 Threads amount after GC: 23412 ```

2 sec leaves even less:

`master` branch 2s `threadDelay` action ``` Threads amount right after: 38577 Threads amount 5 sec later: 38577 Threads amount after GC: 11457 ```

But 4 sec leaves more?:

`master` branch 4s `threadDelay` action ``` Threads amount right after: 42459 Threads amount 5 sec later: 42459 Threads amount after GC: 19088 ```