haskell / ThreadScope

A graphical tool for profiling parallel Haskell programs
http://www.haskell.org/haskellwiki/ThreadScope
Other
153 stars 36 forks source link

Exit with `scanEvents: GlobalSyncGC ModeGHC` on load #46

Closed DaveCTurner closed 9 years ago

DaveCTurner commented 9 years ago

Apologies for the long test case: I haven't found an easy way to make this smaller without the problem going away. It's part of a Project Euler answer.

module Main (main) where

import Data.Numbers.Primes
import Control.Parallel.Strategies

main :: IO ()
main = print $ answerFor 100000

answerFor :: Integer -> Integer
answerFor n = parSum 10000 [ countPrimitiveTrianglesWithPerimeter p | p <- [1..n] ]

countPrimitiveTrianglesWithPerimeter :: Integer -> Integer
countPrimitiveTrianglesWithPerimeter p = sum
  [ trianglesWithPerimeter (div p $ product qs) * (if odd (length qs) then -1 else 1)
  | qs <- subsets $ distinctPrimeFactors p
  ]

trianglesWithPerimeter :: Integer -> Integer
trianglesWithPerimeter p = let p3 = div p 3 in div (p3 * (p3 + 1)) 2

distinctPrimeFactors :: Integer -> [Integer]
distinctPrimeFactors = go primes
  where
  go _ 1 = []
  go [] n = [n]
  go (p:ps) n = if mod n p == 0 then p : go ps (divideOut p $ div n p) else if n < p * p then [n] else go ps n

  divideOut p n = if mod n p == 0 then divideOut p (div n p) else n

subsets :: [a] -> [[a]]
subsets [] = [[]]
subsets (x:xs) = let xss = subsets xs in xss ++ map (x:) xss

parSum :: Num a => Int -> [a] -> a
parSum chunkSize ns = sum (ns `using` parListChunk chunkSize rseq)

And here is what happens when run (on a 4-core VirtualBox VM, GHC 7.8.3, other system info as per #45)

$ ghc --make -rtsopts -threaded -eventlog Main.hs  -Wall -O2
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
$ ./Main +RTS -ls -N
15406050286750
$ threadscope Main.eventlog
threadscope: scanEvents: GlobalSyncGC ModeGHC

The threadscope GUI starts up and starts to load the eventlog, then exits with the message shown. ghc-events doesn't seem to have any problems. Any ideas?

Mikolaj commented 9 years ago

Yes, the eventlog seems not to adhere to the finite machine model I assumed, similarly as in #35. I will look into that.

DaveCTurner commented 9 years ago

The event log in question is at https://www.dropbox.com/s/quik2d4xpb0v7mx/Main.eventlog.xz?dl=0 (at least temporarily)

Mikolaj commented 9 years ago

I can reproduce the problem with your eventlog, but not one I generate. That matches #35, so you may be able to avoid the bug if you run on a straight Linux box. Regardless, I will try to add a workaround analogous to #35.

Mikolaj commented 9 years ago

Pushed a fix to master. This should prevent any such crashes in the future at the cost of a loss of accuracy of GC stats from TS (can be prevented by obtaining the eventlogs on Linux (not on a VM), which probably gives better accuracy regardless of TS). Sorry for the inconvenience.

DaveCTurner commented 9 years ago

Thanks, that works for me now.

Out of interest, I identified the TimeStamp of the offending event as 126593 but couldn't match it up with one in the output of ghc-events which claims that the first event in the log occurs at time 130986. I assume the numbers mean different things. What's the translation between them?

Cheers,

Mikolaj commented 9 years ago

It's the same value, but in ghc-events the value is expressed in picoseconds (I think) and in TS in seconds, omitting 3 last digits (the time source is supposed not to be that accurate, anyway). So the events you speak about are probaly these 2:

126592357: cap 0: all caps stopped for GC
126593778: cap 0: GC stats for heap capset 0: generation 0, 189352 bytes copied, 6140272 bytes slop, 17989632 bytes fragmentation, 4 par threads, 174056 bytes max par copied, 184320 bytes total par copied

With original source the first is of the type that would cause a crash and when this is patched, then the second. But actually, I can't see any wrong ordering around these events so either I've forgotten what goes where or these are not the culprits. Actually, it would make sense to print the timestamp as part of the error message (though now we don't issue this particular message even in assertions-enabled mode --- unless the other sanity checks in assertions would kick in here). This really waits for a proper fix, e.g., by making sure the events are ordered correctly. Perhaps the coming rewrite to ghc-events would help or at least make it easier?

Thank you for the report.