haskell / ghc-events

Library and tool for parsing .eventlog files from GHC
http://www.haskell.org/haskellwiki/ThreadScope
Other
33 stars 34 forks source link

WIP: Constant space sort #61

Open bgamari opened 4 years ago

bgamari commented 4 years ago

This introduces an interface providing constant-space sorting of eventlogs via on-disk merge-sort. I have confirmed that the implementation indeed maintains constant-space behavior, requiring about 79s seconds to sort a 200MB eventlog while not exceeding 40 MBytes residency (using a chunk size of 1e5 events).

This is in contrast to the old in-memory codepath which requires 35 seconds and nearly 5GBytes of residency to sort this same eventlog.

If I increase the chunk size by an order of magnitude (to 1e6 events) then the constant-space sort has essentially the same runtime as the in-memory codepath but runs in merely 300MBytes.

Note: In testing this I realized that several of the encoding paths treated strings incorrectly. Consequently, this depends upon (and contains commits from) #62.

To-do

facundominguez commented 4 years ago

Is it necessary to sort all the events as if they were completely unordered? I was assuming, at the time I wrote my patch, that the events of each capability would come in order within each capability, so all that was necessary was merging the capability streams back again.

Perhaps there is some inconvenience in splitting the capability streams that I'm missing?

maoe commented 4 years ago

I think facundominguez's point is valid. Also it seems that #14 gets in the way if we're going to replace the existing sortEvents with this constant space sorting.