haskell / ghc-events

Library and tool for parsing .eventlog files from GHC
http://www.haskell.org/haskellwiki/ThreadScope
Other
33 stars 33 forks source link

getEvent can't deserialize event serialized by putEvent #41

Open osa1 opened 6 years ago

osa1 commented 6 years ago

ghc-events currently fails to deserialize some events that it serializes. To reproduce, first generate a minimal eventlog file from the program main = return () (tried GHC 8.4), then run

import Data.Binary.Get
import Data.Binary.Put
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as LBS
import GHC.RTS.Events
import GHC.RTS.Events.Binary
import GHC.RTS.Events.Incremental

main = do
    bs <- LBS.readFile "empty.eventlog"
    let header = runGet getHeader bs
    let parsers = mkEventParsers header

    let event = Event
          { evTime = 415731793
          , evSpec = GCStatsGHC
              { heapCapset = 0
              , gen = 1
              , copied = 305104
              , slop = 250088
              , frag = 3088384
              , parNThreads = 12
              , parMaxCopied = 4246
              , parTotCopied = 305104
              }
          , evCap = Nothing
          }

    print (runGet (getEvent parsers) (runPut (putEvent event)))

Output:

bug: Data.Binary.Get.runGet at position 60: not enough bytes
CallStack (from HasCallStack):
  error, called at libraries/binary/src/Data/Binary/Get.hs:351:5 in binary-0.8.5.1:Data.Binary.Get
maoe commented 6 years ago

What does mkEventParsers do?

osa1 commented 6 years ago

Given a header it returns event parsers (mkEventParsers :: Header -> EventParsers). See the definition here: https://github.com/osa1/ghc-events/blob/5090f824fa709415336248342674fdd4862695da/src/GHC/RTS/Events/Incremental.hs#L161 basically just the event parser builder from mkEventDecoder factored out for reuse.

osa1 commented 6 years ago

I added a bunch of debug prints and it seems like the parser for EVENT_GC_STATS_GHC event can actually parse this and the parsed evSpec is identical to the original one, but somehow getEvent needs one more word before returning. If I add one more word to the bytestring

let bs = runPut (putEvent event)
print (runGet (getEvent parsers) (bs <> "00000000"))

it works...

osa1 commented 6 years ago

OK so this is because of padParser which for some reason makes this parser expect one more word after the actual contents. Investigating more ...

osa1 commented 6 years ago

So according to the event type parser event type of EVENT_GC_STATS_GHC is

EventType {num = 53, desc = "GC statistics", size = Just 58}

this says the size is 58 but actually it's declared as 50 in GHC.RTS.Events.Binary and putEvent also puts 50 bytes for the event spec (60 when you add the timestamp and event type).

So assuming event type parser is not buggy

Not sure why there's a mismatch. Perhaps there's one more field to EVENT_GC_STATS that we currently don't parse?

osa1 commented 6 years ago

Right, so GHC generates one more word for EVENT_GC_STATS_GHC which is for "par_balanced_copied": https://github.com/ghc/ghc/blob/65eec9cfd4410c0e30b0ed06116c15f8ce3de49d/rts/eventlog/EventLog.c#L413-L424

Not sure how to deal with this ...

osa1 commented 6 years ago

Quoting from the diff that added this new field: https://phabricator.haskell.org/D3658

Add the new value at the end, and we'll gradually deprecate par_max_copied. Old clients that expect the old format will understand everything except the new field, and will mostly work, except that they might miscalculate work balance.

So they relied on the fact that ghc-events (or other GHC event parsers) can deal with events larger than the parser for the event expects.