lttng / lttng-scope

A trace viewer and analyzer for LTTng kernel and user space traces
https://lttng.org/beta/#lttng-scope
Eclipse Public License 1.0
28 stars 6 forks source link

Event filter limit on number of items #70

Open compudj opened 6 years ago

compudj commented 6 years ago

With a 20s kernel trace (~200M) with all kernel events, if I zoom out, and do a filter icon for "sched_switch", the following happens:

scope1

ghost commented 6 years ago

I will detail what is happening here, so that we can define and prioritize the next steps.

1) processing takes a while,

Yes it does! That's probably the biggest problem. This is because the timegraphs and "drawn event" series are all defined generically. For every event that matches the filter, the model will ask every single entry in the timegraph "is this event related to you?". If it does, then the icon will be placed on that line. The more entries there are in the timegraph, the longer it takes to place a single event. That is why the Threads view is usually much slower than the CPU view at showing those events.

A better approach would be to define things like Threads (and also CPU, etc.) at the timegraph model level. That way assigning an event to a row becomes a O(1) operation, instead of having to check all timegraph entries one by one. This would require some rework of how the timegraph models are defined.

2) there is no way to cancel

Right now there are still many asynchronous jobs started and handled at the library level (this dates back to the pre-Scope days...). This is not very good practice, the library should offer functionality and it should be the application that decides to call those functions in separate threads or not. The library should be reworked to bring those threads at the application level, and tie those into the ScopeTaskManager. Doing this will display the running threads in the status bar and in the Task window, and offer the user a way to cancel them. (although right now cancelling is a bit broken, see #52 )

3) I would expect a modal window to appear after a while telling me "It looks like the information you requested is taking a while to compute, do you wish to cancel ?"

We see this in web browsers because that is the only way the users can cancel the ongoing operations. If we display a progress bar (eventually with a completion % and a time estimate) and offer the user to cancel operations through that, do we really need a more invasive modal window?

Also, if we fix 1) perhaps these operations will become so fast that the users will never need/want to cancel them.

4) There appears to be an arbitrary limit to the number of icons

There is, the limit is 2000 events. The reason is twofold:

If we fix/improve 1), then the first point becomes less problem. However even with that, the second point will remain a problem : if we display too many nodes in the scenegraph, then the whole application becomes sluggish and less responsive.

Right now one "event match" corresponds directly to one "icon in the view". This could be reworked to instead use "buckets" so that if more than one event match is found for the same pixel (or the same region of a few pixels?) we can instead display only one icon in the scenegraph. The good news is that this can be all done at the model layer, so doing so will benefit all the views automatically.