Handling EventBlocks and roundtripping

TeofilC commented 1 year ago

The eventlog is structured as a list of blocks of events.

A block has a capability number that specifies the capability of upcoming events, and some information about when the block was written.

Currently we erase block events when reading the eventlog. This leads to two issues:

when writing the eventlog back out, we have to recreate blocks. But can't do so properly since information has been lost leading to roundtripping failures. I believe this would be a big step towards addressing #14
we cannot currently figure out from the eventlog when the application is busy writing to the eventlog, but this is exactly what the two timestamps on the block event tell us.

My proposal is to keep the block events during parsing and require their presence when writing out eventlogs. This introduces some new illegal states, ie, an eventlog without block events could not now be written to a file. How does this sound?

An alternative is to change the types to make these states unrepresentable but I don't think the breaking change from that would be worth it.

Mikolaj commented 1 year ago

That sounds good to me as a past Threadscope contributor, but we'd probably need feedback from current heavy eventlog users, e.g., @mpickering. Who would be the main consumer of the new feature?

TeofilC commented 1 year ago

I think the main consumer would be tools that want to figure out mutator time. Currently we expose information about GC pauses but don't expose information about event log flush pauses. This information could also be added to Threadscope for instance.

There's also quite an old GHC ticket asking for this https://gitlab.haskell.org/ghc/ghc/-/issues/11950. Surprisingly this feature was already implemented in the eventlog when this ticket was opened (!) but just not exposed by GHC-events.

The other appeal is that it makes it a bit easier to process event logs in a streaming way. For instance, currently the API doesn't expose a way to write an eventlog without sorting all the events (in order to create dummy eventblocks). I recently ran into this when trying to filter out a small time range from a very large eventlog. It would also make it a bit easier to process events in order without sorting the eventlog, though it could be done without this too.

haskell / ghc-events

Handling EventBlocks and roundtripping #99