timelyportfolio / buildingwidgets

repo for a htmlwidget a week documented at https://buildingwidgets.com
https://timelyportfolio.github.io/buildingwidgets/
MIT License
63 stars 12 forks source link

PivotTable.js #18

Closed vnijs closed 9 years ago

vnijs commented 9 years ago

https://github.com/nicolaskruchten/pivottable

There is a binding here: https://github.com/smartinsightsfromdata/rpivotTable

I am actually most interested in the drag and drop functionality for the variables. The plotting, tables, etc, could be handled in R (e.g., dplyr).

vnijs commented 9 years ago

Oh. And keep up the amazing work! Thanks!

timelyportfolio commented 9 years ago

Thanks, I'm also watching @jcizel who has forked @smartinsightsfromdata rPivotTable. I'll get with both to see how we can get to a fairly complete implementation of it.

vnijs commented 9 years ago

That would be fantastic. An (small) version that only does variable dragging and dropping would be great as well :)

smartinsightsfromdata commented 9 years ago

@mostly-harmless the version up and running here is already the "variable dragging and dropping" one you are asking for: all the features can be driven by R or set by the user.

smartinsightsfromdata commented 9 years ago

@timelyportfolio I will look into add the treemap: I played with it but I didn't want to complicate things.

If you look into all the current features of the standard renderer they are a-plenty already.

vnijs commented 9 years ago

@smartinsightsfromdata could I, for example, use dplyr to do the summarizing and grouping? Use ggvis for plotting? DataTables for the tables?

smartinsightsfromdata commented 9 years ago

@mostly-harmless Not sure what you are after: you can use any R packages to massage data appropriately ultimately in a data.frame or a data.table. You "drive" the features of rpivottable using, for example, a function call as such: rpivotTable(data = dt, rows = "Party", cols = "Province", vals = "votes", aggregatorName = "Sum") If you provide a data.table (here called dt), all the column names will appear in the "draggable" area of the pivottable. You can specify if you want the table to render with already some column name pre-dragged into rows or cols (as a sort of mini-report if you like). As in the above example, if you have a numeric variable called "votes", you can see the votes with Party (e.g. Liberals; NPD etc.) in rows and Province (e.g. Quebec etc.) in columns (of course you still will have all the remaining columns in the data.table available to be dragged either in rows or cols). The pivottable library does not do any plotting. Nicholas (the author) has developed some "renderers" to show line graphs etc., but he has used GoogleChart for this (and GoogleChart has the notorious t&c limitations that limit a general adoption). I am not terribly in favour to move this library in these directions, as you have all of the R visualisations libraries (especially the ones developed as part of the htmlwidgets family) already at your disposal: you can have one and the others side-by-side.

timelyportfolio commented 9 years ago

Yes, I agree with @smartinsightsfromdata, if you (@mostly-harmless) are just looking for pretty tables as output, I would suggest the DT htmlwidget from RStudio https://github.com/rstudio/dt with shiny. See http://rstudio.github.io/DT#a-shiny-example-experimental for an example.

vnijs commented 9 years ago

My question is more about efficiency. How efficient is the js library at calculating frequencies, averages, etc. for large datasets compared to dplyr for example? @smartinsightsfromdata from your reply it seems R passes the data and variables to pivottable and lets pivottable do the calculations. Perhaps it is really efficient and if so that is great. Do you have any insights on that?

smartinsightsfromdata commented 9 years ago

@mostly-harmless I admit I haven't tested rpivotTable with thousands of rows / columns etc. (but it makes sense to do it and I will). On the other hand I suspect it is a bit outside the ranges of legitimate usage. Bear in mind that if you have say thousands of rows, just dragging a dimension on either rows or cols will displays all those rows on your browser. If I have to do summarisation of thousands of rows and columns nothing would beat data.table or dplyr (people like to take their pick). rpivottable is for the "last mile" in visualisation: when you have defined an aggregate of data you want to analyse more deeply, probably less than 1,000 rows and possibly less than 10 columns (which may include some frequencies etc.). My main concerns would be in defining a subset that is meaningful for the user: too many columns would probably not be that effective as a communication / analysis tool. As an aside, you could probably say that a pivot table is an old fashioned tool, as it really works mainly on an aggregate with appropriate business-focussed dimensions. Now we are used to see packages that can help us to detect some "inner meaning" in the data displaying schema-less data as with a scatterplot...

timelyportfolio commented 9 years ago

Passing data back and forth will be a big constraint also.

vnijs commented 9 years ago

@smartinsightsfromdata I wasn't referring to the number of rows and columns in the table but the number of rows and columns in the data. As @timelyportfolio mentions, getting the data to the client-side can be an issue. Once it is there you still have to do the summarization. How efficient is the js code to do that? The final table could be 2x2 but could be based on millions of rows of data.

smartinsightsfromdata commented 9 years ago

@mostly-harmless the pivottable is very efficient. From Nicolas Q&A

Question: How big can the input data be?

Current answer: the best answer so far is "try it and see", as the performance of the system will depend on the machine/browser being used, the exact nature of the aggregators etc. It's proven difficult to give rules of thumb for estimating what will work well beyond 100,000 records, but certainly up to that point results seem satisfactory for most applications

So we have a (reasonable) upper limit. You could probably compare with the performance of the classic table used in shiny "DataTable": I am not sure if I would use DataTables for rendering >100,000 records, even if it handles pagination etc..

timelyportfolio commented 9 years ago

I agree. However, millions of rows not going to happen here.

timelyportfolio commented 9 years ago

I will do some tests and report back.

vnijs commented 9 years ago

@smartinsightsfromdata, @timelyportfolio, @jcizel

In the DT package all processing of data can be done on the server-side, e.g., with dplyr, so there is no real reason it couldn't handle millions of rows of 'raw' data. A cross-table with more than a few rows-columns is not all that useful anyway so having the 'work' done on the server-side might be the way to go.

For me the most interesting part of pivottable.js is the drag and drop interface for variable selection and the button pop-ups to select and filter levels, attributes, etc. Is it feasible to push the unique levels for a variable into the pivottable buttons? I guess my question is if the variable drag-and-drop and pop-up filter functionality in pivottable.js could be used as an interface for DT, ggvis, etc.?

smartinsightsfromdata commented 9 years ago

@mostly-harmless these "widgets" or UI elements are intrinsically part of pivottable (moreover originally developed in coffee script etc.). You may ask rstudio if they have similar drag&drop widgets in the shiny roadmap...

timelyportfolio commented 9 years ago

I agree this functionality would be awesome and really cool. However a dplyr GUI will unfortunately be way beyond the scope of this humble little widget. To get that, we'll need to start from the ground up.

Unrelated, but for my own education, what field are you working in where you'll be analyzing this massive data?

Hope you'll still try out the improving-as-we-speak rpivotTable widget and help us test it and iterate it to something useful.

vnijs commented 9 years ago

I actually think I have a chunk of the code needed for that in bits-and-pieces already. The thing I definitely don’t have is the js components or the js skills to make/bind them. I could get started with a set of selectize inputs (i.e., column, row, and level/range selection) and tie-in DT. Then perhaps a nicer drag-and-drop UI could be added afterwards. Sound interesting?

timelyportfolio commented 9 years ago

As an example with "bigger-ish" data, you can try this

data(diamonds,package="ggplot2")
rpivotTable(diamonds)

Yes, very interested in what you are talking about.

vnijs commented 9 years ago

That actually works very smoothly. I will try-out rpivottable in a shiny app.

I will try to put something dplyr based together next week.

— Sent from Mailbox

On Fri, Feb 6, 2015 at 11:46 AM, timelyportfolio notifications@github.com wrote:

As an example with "bigger-ish" data, you can try this

data(diamonds,package="ggplot2")
rpivotTable(diamonds)

Yes, very interested in what you are talking about.

Reply to this email directly or view it on GitHub: https://github.com/timelyportfolio/buildingwidgets/issues/18#issuecomment-73300253

vnijs commented 9 years ago

@timelyportfolio Took me a bit longer than expected but I finally put together a dplyr based pivot table using DT. See link below (pivot tab). Code here. Comments and suggestions welcome.

http://vnijs.rady.ucsd.edu:3838/marketing/?SSUID=a848c322dd

timelyportfolio commented 9 years ago

now that is slick! Thanks for sharing.