MilesMcBain commented 7 years ago

Okay I'll lead off by summarising some of things that have been said on Twitter, our email thread, and bobbing around in my own brain:

Get the basics right

From the conversation we've had so far, it sounds like there is consensus about not straying too far from the ggplot2 grammar of graphics interface for traditional 2D statistical plotting. As @hrbrmstr said, a lot of folks already struggle making useful visualisations for their analysis and sometimes interactivity is an unwelcome distraction. From this base we would like to build out well known interactivity mechanisms like zoom, brush, hover, and selection.

Be Reactive?

Looking at this from the development UX perspective, I am really excited by the ideas Mike Bostock put forward in his D3.express reveal, and how we might incorporate reactive bindings to enable users to more rapidly learn the grammar of graphics. This is one way we could making building the appropriate vis easier.

Be Modular, Adaptable, Extensible

It was mentioned that previous efforts have fallen down by being closely coupled to graphics back-end that was not portable. Web technologies as the solution to this have been implicit in the conversation. There are multiple viable back-ends and it will be worth investigating whether a flexible decoupled architecture can give us the best of a few worlds (first suggested by @thomasp85). WebGl has been mentioned as worthy of interest. I firmly agree with @thomasp85 in that an extension mechanism needs to be baked in from the start.

Handle Streams Natively

My personal views, that relate to the source of the Tweet-storm, are that whatever comes next needs to handle visualisations of streaming data. This is useful for raw data vis and extremely important for diagnostics and predictions of sequential models. This requirement means the rendering framework needs to be FAST (Hello WebGl) and have some inherent concept of plots that are update-able in a more fundamental way than layering.

Lead the Way to What's Next

Can we create an interface that allows a reasonably seamless transition between 2D, 3D, AR, or VR data visualisation? There's a lot to bite off here, but extensions into this space would be a huge draw card for R.

mdsumner commented 7 years ago

Thanks Miles, I'm a bit out of my depth in the visualization tech. but I believe that there are two main and potentially opposing goals here. One is meaningful interactive graphics and the other is a very generic flexible and efficient language of data representation. I think we must find a way to marry these two topics, and I do think a solution lies in this integration.

Every geometric feature in ggplot2 and in sp/sf is representable by path-based structures, these are planar turtle-head-down continuous traces between coordinates. They are generally treated as discrete closed rings or open connected lines, and any space-filling is provided by the planar "polypath" convention for winding or the even-odd rule. Ggplot2 does have a a convention for a sequence of line segments, treating the attribute on the first coordinate as "the value on the segment", and it doesn't otherwise have "true primitives" - in the WebGL or simplicial complex sense.

True primitives (triangles, line segments, tetrahedrons) allow models to incorporate both continuously varying and discrete properties in one structure, but this power tends to be watered down or removed in traditional implementations. For example, GIS gains great efficiency being bound to planar path forms, and it bakes in a single-level grouping to separate the geometry from the properties of the features. When GIS breaks out of this constraint it does it on the fly, or in ways that are specific to implementations (there's no standard for breaking-out and returning).

Rgl in comparison is (mostly) a simplicial complex form, with a (transpose) matrix of homogeneous coordinates indexed by a (transpose) matrix of the primitives. It's confusing though, because the two matrices are only seen explicitly in triangulation and quadrilateral forms (quads are not "proper" primitives, but they are used by GL as a convenience - IIUC). The line segment "meshes" in rgl are much simpler, and tend to be defined implicitly without any formal structures. These forms in rgl can have aesthetics applied, and generally discrete properties are those that are constant among a primitive's vertices, continuous ones can vary - the continuous mode includes the geometric representation of the primitive of course, as well as colours, shading properties, width and so on.

The indexed matrices in rgl are well represented in data frame form by relations between tables, and in this form there is a lot more scope for higher level groupings and structure. When it's done on data frames there's no limit in what properties can be stored, but we leave behind the simple convenience of a single data frame, but gain the generality and back-endability of database and memory management tools. I think there's a lot to be gained at the basic data-structure handling side, and what I've learnt from dplyr and tidygraph in particular, as well as sf suggests there's a lot of room for improvement at the R level, at least for learning what is possible.

Having more general and low level hooks into graphics engines like WebGL would go a long way I think, there are "scenes" that I can construct that are really valuable but are hard for me to share given the multi-step preparation or merging of different data, and the device requirements for rgl. I've found the WebGL write from rgl can only handle smallish content, and I don't know why that is, but also the navigation mode by default is "pivot on centre of data" which is no good for immersive exploration. Rgl is also very good at texture mapping, and the indexing is pretty straightforward to do, but it has to go through a PNG file and so is less flexible than R is usually for that kind of data.

R is already immensely capable of traversing and analysing these rich structures, and topological manipulation and analysis and physical modelling is a natural fit as well as a rich source of visualization.

MilesMcBain commented 7 years ago

Thanks for the contribution Michael. I think I finally understand what you've been getting excited about. So if I am reading it correctly, you are imagining a new world where a row in a dataframe could be one of a number of primitives. In ggplot2 a row is mostly understood to contain information about a point on a 2d plane, but in this new scenario it could be a triangle, a line segment, or a tetrahedron etc?

If you pursue this idea to its extreme conclusion with many kinds of basic primitives I think you might end up at something like D3? So the key to avoiding that is to do what you did and stick to primitives that are of very direct use to spatial data modelling. I like the sounds of this. Triangles for fun with meshes in particular!

As you say, we already have a lot of useful tools for representing and manipulating these types of dataframes. :+1:

mdsumner commented 7 years ago

Yes, I'm pretty convinced that a broader family of network-ready and relational data structures is a key complement to future visualization and interactivity. There's no one "right model", there are very general ones that can do everything and are "heavy" (ultimately a database schema is a very heavy, very general model), and there are also very specific ones that are optimal in a certain context, or simple-enough to be one data frame. There's a lot of value in the ecosystem in between, and I've been heartened by how useful a multi-data-frame model can be (1. examples with 1D or 2D primitive meshes - and facilities to convert any(?) model to/from that, and 2. with NetCDF metadata and extraction). I run up against technical problems that I'm slowly learning to get around - interactivity and visualization are usually the big ticket obstacles.

I believe what I'm saying is a little bit tangential to the immediate project, but I think it's important enough to flesh out given where this could go - I think R is the future of this, there's no other project with as much generality and potential. I'll contribute more along these lines with longer illustrations in time, and for now take direction from others for the specific tasks that I can help with.

mdsumner commented 7 years ago

@MilesMcBain I didn't address the D3 question, D3 is primarily arc-node topology - it's an intermediate step from path-based features to 1D primitives - but it's inherently planar-only, the topological magic provided by the arc-node model is that the shared boundaries between features can be dynamically segmentized with reference to the curvature of the space they traverse. So it's not quite primitives, but leverages the power of them in a specific context. (There's more to D3 than this, but that's the crux of the "Flawed Example". "Arcs" are the shared boundaries between polygons between a 3-or-more connected node. The "nodes" are those multi-branch points, all other coordinates are "normal vertices").

dicook commented 7 years ago

Another principle that I think is important is "Data is central". Interaction and linking can be though of as aspects of the data. This way we can always tie the operations back to data analysis and statistical thinking.

dicook commented 7 years ago

This is the reading list that I would recommend:

[ ] Buja, A., Asimov, D., Hurley, C., and McDonald, J. A. (1988). Elements of a Viewing Pipeline for Data Analysis. In W. S. Cleveland and M. E. McGill, editors, Dynamic Graphics for Statistics, pages 277–308. Wadsworth, Monterey, CA.
[ ] Velleman, P. F. and Velleman, A. Y. (1988). The Datadesk Handbook. Ithaca, New York, Odesta.
[ ] Buja, A., Cook, D., and Swayne, D. (1996). Interactive High-Dimensional Data Visualization. Journal of Computational and Graphical Statistics, 5(1), 78–99.
[ ] Unwin, A. R., Hawkins, G., Hofmann, H., and Siegl, B. (1996). Interactive Graphics for Data Sets with Missing Values – MANET. Journal of Computational and Graphical Statistics, 5(2), 113–122.
[ ] Swayne, D. F. and Klinke, S. (1999). Introduction to the Special Issue on Interactive Graphical Data Analysis: What is Interaction? Computational Statistics, 14(1), 1–6.
[ ] Sutherland, P., Rossini, A., Lumley, T., Dickerson, J., Cox, Z., and Cook, D. (2000). Orca: A Visualization Toolkit for High-Dimensional Data. Journal of Computational and Graphical Statistics, 9(3):509–529.
[ ] Wilkinson, L., Rope, D. J., Carr, D. B., and Rubin, M. A. (2000). The Language of Graphics. Journal of Computational and Graphical Statistics, 9(3), 530–543.
[ ] Wilkinson, L., Rubin, M., Rope, D., and Norton, A. (2001). nViZn: An Algebra-based Visualization System. In Proceedings of the 1st International Symposium on Smart Graphics, pages 76–82.
[ ] Wickham, H., Lawrence, M., Cook, D., Buja, A., Hofmann, H. and Swayne, D. F. (2008) The Plumbing of Interactive Graphics Computational Statistics, http://dx.doi.org/10.1007/s00180-008-0116-x.
[ ] Xie, Y., Hofmann, H., and Cheng, X. (2014). Reactive Programming for Interactive Graphics. Statistical Science, 29(2), 201–213.

dicook commented 7 years ago

Most interactive graphics designs really fail to cater to the tour algorithm.

MilesMcBain commented 7 years ago

Thanks for those Di. I've added them to READING.md in addition to @cpsievert's thesis, suggested by @njtierney.

Anyone following this conversation is welcome to make additions.

DataStrategist commented 7 years ago

Hi all. I have a lot of experience as the agile "Product Owner", and could bring the functionality view to this argument. Would it be beneficial if I draft a high-level set of features we should be designing for, which we could then break out into epics or Projects or whatever it might be? If so, should I keep comments here or should I start new issue?

MilesMcBain commented 7 years ago

Thanks @mexindian! We will benefit from those skills! I think there's a bit of shaking out to do and a feature document could be useful as a strawman. Before you do too much work on that though, we should probably do some work to summarise the reading list and collect examples of existing packages and frameworks that have features we like. New issue threads that turn into documents eventually makes sense to me.

DataStrategist commented 7 years ago

Sounds good @MilesMcBain I'll chill for a bit. But I would like to add a log to the fire... you did mention VR,AR,3d etc... but I think there's something more basic that was missed: mobile. How do hover, zoom, brush, select, drilldown actions map to a small screen? Are all of them appropriate?

cjyetman commented 7 years ago

I may not be grasping the concept here, but if the primary goal is facilitating interactivity--like zoom, brush, hover, and selection--using a ggplot-style syntax, doesn't ggvis already mostly achieve that? Maybe pointing out where that fails the expectations would be instructive?

MilesMcBain commented 6 years ago

@mexindian You've made a good point there. Relying on interactivity that doesn't work well on mobile is most likely a bad idea if you are creating a visualisation for a general audience.

This raises an important distinction for me: Interactive visualisations for scientific applications, like statistical inference have a different set of conditions to those used for public 'story telling'. In the scientific context, a pointer would be a fair assumption.

I'd expect the next gen interactive visualisation framework to support both modes.

MilesMcBain commented 6 years ago

@cjyetman This thread on community.rstudio.com has some useful discussion on ggvis if you haven't seen it yet. Hadley has said it is 'lying fallow'.

ropensci-archive / rivis

Some principles for future vis efforts. #2

Get the basics right

Be Reactive?

Be Modular, Adaptable, Extensible

Handle Streams Natively

Lead the Way to What's Next