haskell / ghc-events

Library and tool for parsing .eventlog files from GHC
http://www.haskell.org/haskellwiki/ThreadScope
Other
33 stars 33 forks source link

Incremental parser, cleaned up branch #13

Closed kvelicka closed 8 years ago

kvelicka commented 8 years ago

The long-overdue cleanup of my incremental parser for ghc-events. All the history is now linear, and some unnecessary commits were omitted.

@Mikolaj, should be ready for a review! However, the commit timeline still reflects the progress of the work as it was originally implemented, so there will be some bits that get implemented, and then re-implemented/removed. I hope that this will not be too inconvenient.
Note that in the final version the test cases are failing since the log re-writing has been broken for God knows how long (pre-0.4.4.0 at least) and the pretty printer output was (I suspect) altered by the PRs to master that were made after my original branch split off. If this is all looking good, I'm willing to look into these further, as well as other issues/PRs to ghc-events and Threadscope.

NCrashed commented 8 years ago

Hi, I am interested in the PR as incremental parser is needed for robust implementation of my Haskell Summer of Code project.

The comment exists to show that your work is really appreciated and useful. Also I can help you to review the code, although I am not a maintainer of the package.

Mikolaj commented 8 years ago

@kvelicka: splendid! I will have a look within the next few days.

Mikolaj commented 8 years ago

@kvelicka: how much can I relax the "network == 2.6.0.2" constraint you added?

kvelicka commented 8 years ago

@Mikolaj, I've just added one more commit (16bab59) to the PR that removes the network dependency entirely. I've talked to @NCrashed a couple of days ago and we came to the conclusion that keeping ghc-events a library whose sole purpose is dealing with .eventlog files is more flexible.

Mikolaj commented 8 years ago

thanks a lot; compiles fine

kvelicka commented 8 years ago

I think you mentioned way-back-when that you wanted to test that Threadscope compiles with this version as well, is that still in the plans?

Mikolaj commented 8 years ago

Yes. :) I will share here the results of the review, when I'm finished.

kvelicka commented 8 years ago

@Mikolaj: Apologies for not including this in the initial PR. I noticed there was a bit of logic that got lost from #7 during the big chery-pick/merge. I've just added it now, which also makes regression tests pass. I'm closing #7 as well since it's been merged into master already and I'm not even sure why is it still open (do correct me and reopen if I'm wrong though!)

Mikolaj commented 8 years ago

Great, thank you, now test-versions passes and write-merge still fails the same, which is probably what you mentioned earlier ("Ix{Int}.index: Index (39) out of range ((0,19))").

Threadscope compilation fails quite miserably

https://gist.github.com/Mikolaj/6825352618f99fa6f7474a19f1560197

but that's to be expected, since it delves into the ghc-events' API quite deeply. Will try to fix it tomorrow and see if the really important bits used in Threadscope still work, namely event profiles &Co.

kvelicka commented 8 years ago

Yes, that's it. If I remember correctly, I ran the same test suite on 0.4.4.0 earlier on and it was failing with the same error. Therefore, merging/re-writing eventlogs is (just as) broken and will need fixing at some point down the line.

As far as the compilation errors in your gist go, the following may make the work easier:

There also seems to be some String/Text ambiguities there as well, potentially due to newer compiler/library versions but these appear to be be separate from my stuff.

If you find that you need any further assistance trying to get TS to a running state, let me know!

Mikolaj commented 8 years ago

I've made ThreadScope compile with ghc-events at https://github.com/haskell/ThreadScope/tree/ghc-events-0.5

All eventlogs from ghc-events/test/ load fine, except for .mandelbrot-mmc-2011-06-14.eventlog Even the partial log loads correctly (I mean testlog-part.eventlog), which is amazing. Even the logs with spark histograms enabled (e.g., queens-ghc-7.0.2.eventlog) display fine, so the event profiling machinery must have at least partially survived. :) I've also tried, with success, some of the wild logs from https://github.com/Mikolaj/eventlogs, but didn't wait for the larger ones to finish loading. I've also run some of the threadscope -t foo tests.

Fit-to-windows stopped working or I forgot how to use it. Perhaps it was getting the time span of events from the EventBlock or something.

I need to test it some more, but it looks really good for the first attempt.

Generally, the changes in ghc-events really improve the ThreadScope codebase (possibly, the codebase could be simplified even more, but I just made it compile).

kvelicka commented 8 years ago

That is great news!

I wonder why the mandelbrot log is failing - ghc-events show seems to be displaying it fine. Do any potentially helpful error messages get emitted?

Also, how big were the wild logs that would take a long time to load? This version of the library is unfortunately slower than the previous ones but I hope it's still fast enough to be practical for larger logs. Or if not, I'll have to find a way to speed it up later on.

As far as the codebase goes, I'm interested in maintaining and further developing ghc-events, and since Threadscope currently is perhaps the main user of it, it would only make sense for me to get involved in its development too - the list of issues could use a trim :)

Mikolaj commented 8 years ago

The error is findRunThreadTime for Event {evTime = 1288380, evSpec = RunThread {thread = 2}, evCap = Just 2}, which means TS couldn't find any StopThread event corresponding to aRunThread event it encountered earlier. BTW, the following three commands also detect that the eventlog is invalid, though at different points:

ghc-events validate threads test/mandelbrot-mmc-2011-06-14.eventlog|less

Invalid eventlog:
State:

Thread States:
Indexed output:
Key: 1, Value: ThreadRunning

Cap States:
Indexed output:
Key: 0, Value: 1

Input:
Event {evTime = 12149284, evSpec = RunThread {thread = 3}, evCap = Just 3}
ghc-events validate threadpool test/mandelbrot-mmc-2011-06-14.eventlog|less

Event {evTime = 19050344109, evSpec = StopThread {thread = 4, status = ThreadFinished}, evCap = Just 1}
ghc-events profile threads test/mandelbrot-mmc-2011-06-14.eventlog|less

((fromList [(1,Profile {profileState = ThreadRunning, profileTime = 4199512})],fromList [(0,1)]),Event {evTime = 12149284, evSpec = RunThread {thread = 3}, evCap = Just 3})
Mikolaj commented 8 years ago

The logs were from the perf/ subdirectory. They are several MB each. I guess they would load fine eventually, with old ghc-event and TS they also took up to several minutes, I think. Also, now we can improve TS to load a portion of the eventlog and start displaying while the full eventlog is still loading, if we want to.

Mikolaj commented 8 years ago

I'd be very happy and I'd gladly help if you could take over any and all of the stuff. :) Right now I'm just helping with ghc-events and TS in my spare private time, but they could really use a more active, formal or informal, maintainer.

kvelicka commented 8 years ago

Hmm.. Could it be that the log is actually invalid in that case?

It's definitely possible that my changes have broken something with the way this log is parsed, but if that was the case, wouldn't the regression tests (test-versions) be failing as well?

And yes, incremental loading would be a great addition. I suspect that may get some love if @NCrashed has enough time on his project to work on tools that would use the live logging system.

NCrashed commented 8 years ago

And yes, incremental loading would be a great addition. I suspect that may get some love if @NCrashed has enough time on his project to work on tools that would use the live logging system.

I already use the PR in the project, it seems to work smoothly despite some occasional Ix{Int}.index: Index (39) out of range ((0,19)). I am preparing a blog post about my progress in RTS patching.

kvelicka commented 8 years ago

The comment above was about Threadscope specifically :) If you encounter any bugs or just weirdness with ghc-events, aside from the issue you just mentioned, I would love to hear about them!
And in fact, if you have a log that consistently fails with the indexing issue, would be great to have it up somewhere, along with a GitHub issue! Although if memory serves well, that may not be a trivial bug to fix.

NCrashed commented 8 years ago

I will open an issue as soon as I catch log that fails with the problem.

Mikolaj commented 8 years ago

I get the index error consistently with 'cabal test' in ghc-events.

kvelicka commented 8 years ago

I'm not (yet) inclined to trust that one as a legitimate error case since it is thrown when parsing a log that has been merged/re-written back to binary from a Haskell-level representation, and I suspect that incorrect re-writing is the root cause there.

Mikolaj commented 8 years ago

Fair enough. BTW, I'm getting the same problems with of validation and profiling of mandelbrot-mmc-2011-06-14.eventlog with ghc-events-0.4.4.0 as with the PR. Moreover, I get the same error message with TS compiled with old ghc-events --- it just finds the problem much faster. So, the faulty eventlog problem is unrelated to this PR. I guess it may even be know and knowingly ignored, but I don't have time to rummage in closed issues, etc.

@kvelicka: could you do a few informal benchmarks and compare the speed of the old and new ghc-events? I wonder if the huge speed gap with the failed TS loading of mandelbrot is is a fluke or if there is some additional performance problem in the PR. You could try the eventlogs/perf logs as well.

kvelicka commented 8 years ago

Hmm, interesting. I'll be having a look into these issues at some point after this PR gets wrapped up.

The performance problem stems from the fact that old ghc-events used to take advantage of already-sorted event blocks when performing the overall sort, which would allow it to use only the last ("merge") step of the mergesort algorithm, effectively resulting in a O(n) sort. The new version doesn't have this anymore, and has to sort in O(n*logn). If you'd rather see acutal numbers, I may have some time for benchmarks later this week

Mikolaj commented 8 years ago

Yes, I think it would make sense to get some numbers. O(log n) is rather small, so we should not be seeing a huge difference, unless there is some other slowdown or the constants are much larger in some situations. With some luck, the huge eventlogs would let us see past any fluke small scale slowdowns, such as the one with loading the mandelbrot log.

As for my plans, I'd like to finish reading the actual PR and finish testing TS by Friday. Then, unless @simonmar or anybody else objects, I'd like to merge the PR and merge the TS changes to master. I guess we may wait with the actual new Hackage releases until we polish things a bit more.

kvelicka commented 8 years ago

Fair enough, I hope to have some produced later in the week.

I agree with not rushing into a Hackage release as well - this PR isn't quite production-ready yet.

NCrashed commented 8 years ago

@kvelicka @Mikolaj as promised, the post. The incremental parser is really amazing, looking forward for further polishing and improvements.

kvelicka commented 8 years ago

@NCrashed: Sweet, glad to hear you're making progress! Looking forward to future posts.

Thought I'd point out something that you may or may not be aware of already, and it may or may not be relevant to the work you're doing: once ghc-events reads a header, it dynamically assembles a list of parsers needed to parse the log associated with it (as opposed to using the full set of parsers) so if you were to replace the log source mid-way, the parser may crash or give unexpected results. Again, not sure if relevant to your work but thought I'd point it out since this couldn't ever happen in previous versions of the library and therefore isn't documented.

Mikolaj commented 8 years ago

OK, I've tested TS with the new ghc-events on many eventlogs and read the PR and it all looks good to me. As soon as we have any experimental data on the slowdown, I'm ready to merge Friday or Saturday (later on I'm on vacation), unless anybody objects. @kvelicka: splendid job indeed.

Mikolaj commented 8 years ago

I'm leaving earlier than expected, so I'm merging now.