Add alternative view - Githubissues

ndmitchell commented 7 years ago

When viewing profiles using something like Visual Studio, there are really two views I want to see.

A top-down view, showing which areas time has gone in, exactly like profiteur provides.
A bottom-up view, showing which leaf functions are hot spots. For example, I might have a program that does XML processing throughout, and it's useful to identify N% of the time went in XML processing, even though that was in many top-down parts of the program.

Is there any interest in adding a second profiteur view that shows that? You could imagine being able to group by function, by module or by full expression in the profile (e.g. foo.\.\.).

Profiteur has been exceptionally cool so far, very impressed!

jaspervdj commented 7 years ago

@ndmitchell Yes, I've found myself wanting something like that as well. I imagine this could be done as a second "tab" in profiteur.

I think that ideally, it would be done using some "flatten" phase, which takes all nodes in the profile and "sums" them per fully qualified name. This means that the first "level" of the tree would have N nodes, where N is the unique number of qualified names (and some CAFs).

To illustrate that with a type: flatten :: ProfTree -> [(QualifiedName, Stats)].

Then I'd want to allow for expanding these nodes. When expanding Foo.bar, we would take all subtrees of Foo.bar (anywhere in the profile), flatten these (summing them), and put that below Foo.bar.

Does that visualization make sense? Or do you have other ideas on how expanding would work?

It could get a bit tricky when Foo.bar appears as a subtree of Foo.bar though. In that case, you'd get sums which exceed 100%, so we'd need to make sure to take that into account.

ndmitchell commented 7 years ago

I think the first level of your tree would be very useful. Once you get past that it would be helpful, but not nearly as helpful, so I wouldn't worry so much - I don't think there's an obvious answer.

Another thing useful would be summing per module name rather than qualified name.

jaspervdj commented 7 years ago

Another thing useful would be summing per module name rather than qualified name.

In that case, it might make sense to just have two fixed "levels": one with all modules, and, below that level, the qualified names in that module.

ndmitchell commented 7 years ago

Yep, that could be useful.

ndmitchell commented 7 years ago

Note that sometimes I want by function, sometimes by module, and if I want by module expanding to show contained functions is useful. But I separately still want by-function at the top-level, without having a module grouping.

ndmitchell commented 7 years ago

Thinking further, what I secretly might want is a mode where any cost centre with more than 2 distinct immediate parents gets dragged out to the top level and has its cost deleted from where it was originally. In that way if I have two functions, both of which spend 90% of their time parsing CSV, it becomes immediately obvious. It's also less of a departure from what you have now.

I think it might supersede everything else in this thread apart from module grouping - and that could be done by introducing a new top level module grouping and reparenting.

maoe commented 7 years ago

Sorry for intercepting the discussion but I'm wondering how module level breakdown is useful because I'm writing a text based profiling viewer.

@ndmitchell Could you elaborate on your profiling workflow a bit?

ndmitchell commented 7 years ago

For context, I imagine 3 profiling tasks:

I have a program which does several things (e.g. a and b) - where does the time go - whats the relative ratio between a and b. Once I see something suspicious I drill down inside a to investigate. Very well served by profiteur.
I have a program which does several things (e.g. a and b), but both involve parsing XML documents, so while there is a 50:50 split between the two tasks, 90% of both sides is XML parsing. Here I'd really like to "merge" those two XML parsing calls and drill down into that as a whole. My suggestion of lifting any cost center with 2 parents to the top-level would then show a being 5%, b being 5% and XML parse being 90%. With that rearrangement I can effectively investigate the real problem. Particularly important if there are N tasks, and XML processing is a small part of each, but a big part of the whole - investigating one at a time might hide the biggest cost in my program. This task is hard in porfiteur, and well dealt with by the top summary in GHC profiles - but quite badly. A table was my first suggestion, but my revised suggestion is lifting things with multiple parents.
I have a program which does XML extraction, using functions to get the child nodes, get the attributes, extract the tags etc. None of these functions individually may be huge, but together "XML Processing" is big. I was thinking module names might be a possible proxy for the grouping. I think if I had the second bullet this desire might disappear - or might remain.

As I think more, the reparenting would work really nicely with the existing profiteur approach, and could be added as an alternative in the "View by time" and "View by alloc" drop-down.

maoe commented 7 years ago

@ndmitchell Thank you for the explanation.

Sorry for the shameless plug but I've added experimental support of module level breakdown in my viewprof, which is a text-based interactive .prof viewer. It is still WIP but you might find interesting to try. You can clone the repo and stack install.

It has three view modes:

Aggregate cost centers view: The default view. It's like the top summary in GHC profiles. The view groups cost centers by cost center name and module name.
Call sites view: If you press enter on the aggregate cost center view, it displays call sites of the cost center you selected. I believe this solves the second task you described above.
Modules view: If you press M, viewprof displays the module level breakdown. This is for the third task you described. I'm still not sure if this is actually useful in practice though.

The key bindings are below:

q to quit the current view
j/k or up/down arrow to navigate focus
C to display aggregate cost center view
M to switch to module breakdown
gg to move to the top, G to move to the bottom
Enter to switch to call site view
t to sort by time
a to sort by allocation
e to sort by # of entries

Please let me know what you think.

Note that all the parsing and analysis are done by ghc-prof so profiteur can use it too.

ndmitchell commented 7 years ago

@maoe - I would strongly suggest releasing viewprof to Hackage - it makes it drastically easier to install, which increases the chance of people playing with it.

You may wish to raise a separate issue on this issue tracker about them switching to ghc-prof. Not sure if they'd want to or not, but certainly having a parser for profile files is valuable, and sharing it as widely as possible might be beneficial.

maoe commented 7 years ago

@ndmitchell Thanks for the feedback. Will do.

ndmitchell commented 7 years ago

For an example of a .prof file that doesn't display well in Profiteur, but hopefully will under the proposed view, see https://gist.github.com/ndmitchell/607aa8c0d80c86817cb1b86b4164236c.

Here Uniplate had a performance regression, so everywhere Uniplate now takes 10x as long as before. However, there are lots of places that call uniplate, so it gets smeared across everything.

jaspervdj / profiteur

Add alternative view #12