mysticfall / pivot4j

Pivot4J provides a common API for OLAP servers which can be used to build an analytical service frontend with pivot style GUI.
Other
128 stars 99 forks source link

Implement visual total (based on render strategy API) #44

Closed mysticfall closed 11 years ago

mysticfall commented 11 years ago

Implement visual total feature(based on render strategy API).

Should be able to configure sub/grand total/avg/min/max/etc per each axis and levels.

mysticfall commented 11 years ago

Testing MDX :

SELECT Union(CrossJoin({[Gender].[F]}, {[Measures].[Store Sales], [Measures].[Store Cost], [Measures].[Unit Sales]}), CrossJoin({[Gender].[M]}, {[Measures].[Store Sales], [Measures].[Store Cost], [Measures].[Unit Sales]})) ON COLUMNS, CrossJoin({[Marital Status].[M], [Marital Status].[S]}, {[Product].[Drink], [Product].[Drink].[Alcoholic Beverages], [Product].[Drink].[Beverages], [Product].[Drink].[Dairy], [Product].[Food], [Product].[Non-Consumable]}) ON ROWS FROM [Sales]

mysticfall commented 11 years ago

http://aboutmydata.blogspot.co.at/2012/11/saiku-totals.html

francescogianferraripini commented 11 years ago

This is a much awaited feature. The point here is that a total must be provided on what is actually on display in the table, not the total of the aggregate level or the all member. For example, in the mdx above, total of drink shoud be ONLY the sum of alcoholic beverages, beverages, and dairy. Other children of drink shoud not add up to the total.

mysticfall commented 11 years ago

Thanks for the hint. As the name impliies, I can only use what is visually displayed on the table at that time. It's the current status of the visual total feature, and total for the Drink member is calculated by its visible child members :

http://imagebin.org/247721

Calculating part is almost done. I'm afraid it made already crazy TableHeaderNode implementation rather crazier :) so it might need to be refactored to sanity soon.

For now, I'm wondering what would be the best way to let users to config totals for each axes. I suppose we need some kind of factory to create instances of appropriate Aggregator interface ondemand.

francescogianferraripini commented 11 years ago

There are 3 approaches to the issue. The first one is yours, delegating the total calculation to the front-end.

The second one is like what is done by Pentaho Analyzer, using custom sets : For each total a set of the selected members is created and the total is displayed

The third one is similar, and should be the right way, delegating the calculation to the VisualTotals function in Mondrian: with member Product.Aggregated as "Aggregate({[Product].[Food].[Baked Goods].[Bread],
[Product].[Food].[Baked Goods].[Bread].[Bagels],
[Product].[Food].[Baked Goods].[Bread].[Muffins]}) select {[Measures].[Profit]} on columns, {VisualTotals( {[Product].[Food].[Baked Goods].[Bread],
[Product].[Food].[Baked Goods].[Bread].[Bagels],
[Product].[Food].[Baked Goods].[Bread].[Muffins]},
), Product.Aggregated } on rows from [Sales]"

This approach cannot be followed due to this bug: http://jira.pentaho.com/browse/MONDRIAN-1294

But if you watch here it seems working... but maybe there are no calculated memebers http://cl.ly/image/2k3i1Q1d2O1b

mysticfall commented 11 years ago

I think you have summarized available options to implement the feature with perfect clarity. I had considered all of the options, although briefly and rather in confused manner before I decided with the first approach.

The reason behind it was that for the second option, I thought it would be rather hard to implement it in reliable manner, since Pivot4J, like JPivot supports two-way synchronization between pivot model and MDX parse tree, so while it has advantage over the other selection based approach (like Olap4J/Saiku) in allowing going back and forth between user MDX and Pivot UI, the synchronization code will be hard to maintain when the complexity of the MDX structure increases.

Maybe I had abandoned this approach prematurely. But that was my initial thought anyway.

On the other hand, I think the third option is not viable for Pivot4J as it relies wholly on Olap4J API, not Mondrian API directly so it also has to take an approach which would also support XMLA based backend.

In a similar situation, I decided to drop Parameter and ParamRef() pseudo functions in favor of expression language support, as the former would be only useable on a Mondrian server.

To sum it up, I still think currently our best option would be the second approach especially when we already have a tree/callback based API in RenderStrategy/PivotRenderer interfaces which would make task like aggregation easier to implement.

The main problem currently I have with that approach is I think, chaotic and maybe rather inefficient implementation of TableHeaderNode class, which I believe coulbe be refactored later.

mysticfall commented 11 years ago

Having a little problem on implementing this feature : https://groups.google.com/forum/#!topic/pivot4j-list/dI3VChY_sxo/discussion

Any ideas would be welcomed!

francescogianferraripini commented 11 years ago

The point here is tricky. In my opinion: The total at each level should be a total of the leaves in the table below that level.

What you see should be what you get.

So IMHO: There should not be any total, but drinks should show the total of the two members below OR Drinks can display the total of ALL drinks. There should be a total of the selected members.

Totals at higher level should add: Members which do not have unselected children members plus Subtotals at lower levels

mysticfall commented 11 years ago

So, I assume you think it better leave totals per hierarchy instead of per level approach, but we should aggregate only the leaves, not parents of each member paths. Do I understand you correctly?

Thanks again for your opinion. It's invaluable really!

francescogianferraripini commented 11 years ago

So, I assume you think it better leave totals per hierarchy instead of per level approach, nope, you should leave at each level. It is ok to see the real total of all drinks (107 unit sales for bulk mail) correctly where it is now. What is important is to see also a total of what has been selected as a leaf at each level (i would like to see a total row below dairy with the total of Alcoholic and Dairy (53 unit sales for bulk mail) but we should aggregate only the leaves, not parents of each member paths. Do I understand you correctly? Right!

But how to implement it, beside recalculating basic aggregation on the frontend? IMHO the best option would be to create a SET (using "WITH SET" clause if i remember it correctly) for each level used in rows and columns (not in where) for which an explicit member selection has been made. Then you should issue queries to the server to get the values in the current context passing in the where the SET.

It should work.

Ciao, F

mysticfall commented 11 years ago

Then, you mean we also need a separate totals per level in addition to hierarchy totals in such cases. I think in that matter we're approaching a consensus as Daniele said basically the same thing on the message board :

https://groups.google.com/d/topic/pivot4j-list/dI3VChY_sxo/discussion

Though as to whether we should aggregate leaves or parents, there we have yet to reach a conclusion. Maybe we should continue this discussion on the message board with Daniele? Or I'll invite him to #pivot4j IRC channel so we could all talk about it.

As to how to implement it, I believe we're still in a good position to do it on the client side even with the level totals as it's not really different from hierarchy totals.