tdaverse / ggtda

ggplot2 extension to visualize persistent homology
https://tdaverse.github.io/ggtda/
GNU General Public License v3.0
21 stars 6 forks source link

Added basic shiny functionality #10

Closed rrrlw closed 4 years ago

rrrlw commented 4 years ago

Complements #6, allows user (particularly people new to persistent homology) to see how persistent homology works for point clouds in 2 dimensions. Here's sample code that should allow you to see the shiny visualization:

devtools::install_github("rrrlw/ggtda@shiny/rrrlw")
library("ggtda")
library("TDAstats")
data("unif2d")
interact_phom(as.data.frame(unif2d))
corybrunson commented 4 years ago

Aha, i see what's happening here, and it works fine on my machine.

I've put off thinking about these facilities so i have several questions; some might belong in an eventual vignette:

Also: I put together R code to construct VR and Čech complexes in a not-at-all optimized way, on the assumption that the plots produced would be static and the code could be improved later! But that was the naïveté of yesterday. I don't know how best to speed that process up—it can't be passed to TDAstats::calculate_homology, which only returns persistence data—but in the meantime it adds a lot to an already-high runtime.

rrrlw commented 4 years ago
  • What are some real-world use cases for functions like interact_phom() that, however many options they are given, will presumably only produce a limited variety of Shiny environments? That is, this is not about customized dashboard development; is it primarily about quick data exploration?

I see interact_phom as mostly of educational value for those new to persistent homology. It can also be used for data exploration, but since it's limited to 2-dimensional datasets, it might be worth incorporating common dimension reduction methods into the shiny app (e.g. PCA, ICA, tSNE, UMAP), which would then be able to accept datasets of larger dimensions. Of course, this would only provide basic data exploration and non-shiny functions would have to be used for further exploration (analogous to the rattle package for R).

Other shiny functions could definitely work on developing other parts of the TDA/visualization pipeline (e.g. @peekxc's code in #6), but I haven't given it much thought yet. Since we're not aiming for shiny modules in the current ggtda submission to CRAN, I think there's definitely a lot of room for change/planning here.

  • Should the user have the option of manipulating the data within the interactive, e.g. by subsetting, which would require re-computation of PH?

It seems like this would add a lot of complexity to the shiny UI, but I might be misunderstanding. Would an acceptable alternative be to have a dropdown list w/ all the data frames/tibbles in the user's environment and allow the user to pick which one they would like to visualize? (maybe they can even interactively pick which 2 columns they would like to visualize or which dimension reduction method to use, see above)

  • Could some of the critical values of the radius (obtained from the persistence data), e.g. the births and deaths of the 6 most persistent features, be flagged in some way on the slider? Alternatively, could the slider increment not by regular values of the radius but instead by critical values? (I'm sure this could be done but i wonder if it should be an option.)

We could have buttons in the shiny UI that could set the slider to critical values (e.g. field 1 asks for dimension, field 2 asks for start vs end, and button press could set slider to start/end of most persistent feature in that dimension - just the first thing that came to mind, definitely can improve on this, but is this along the lines of what you were thinking?)

  • This is a question more for geom_barcode(): Should the user be able to conveniently put their choice of radius or diameter on the continuous axis? For the interactive, radius might be a better choice, for consistency with the slider (or not; it's a useful reminder to interpret cautiously).

IMO, a choice between radius and diameter would be overkill (straightforward transformation); I think we should pick the one that makes the most sense (unsure which one I put in the interactive, happy to switch!).

Also: I put together R code to construct VR and Čech complexes in a not-at-all optimized way, on the assumption that the plots produced would be static and the code could be improved later! But that was the naïveté of yesterday. I don't know how best to speed that process up—it can't be passed to TDAstats::calculate_homology, which only returns persistence data—but in the meantime it adds a lot to an already-high runtime.

Tbh, I don't think it's too bad! But yeah, it'd be cool to optimize the code for the next release w/ shiny, etc.

Also, I should've mentioned above - merging this into shiny/main, which hopefully will have all the shiny TDA visualization modules we'll include in the next release (and can then be merged into master after). Thanks!

peekxc commented 4 years ago

Just some of my thoughts:

Should the user have the option of manipulating the data within the interactive, e.g. by subsetting, which would require re-computation of PH?

I think that could be very computationally expensive for even modest sized data sets. Maybe a button to recompute manually gives the user control. Or actually maybe a checkbox or something that toggles between the options...

Would an acceptable alternative be to have a dropdown list w/ all the data frames/tibbles in the user's environment and allow the user to pick which one they would like to visualize

Or maybe user gives list of names of data frames in current env (or possibly any find-able object in the search() path), and if they're data.frame-like objects they get added? Or maybe the option to load all the data.frames in a given environment... so many options....

corybrunson commented 4 years ago

Thanks! I agree with most of what's been said in reply.

If the primary purpose is educational, then i don't think many additional features are called for. I like the idea of searching the environment for suitable data sets—perhaps only if some conditions are met, in case there would be too many? I might still advocate for the slider options, since it would be nice to be able to use the arrow keys to proceed step by step through the resulting filtration.

As for geom_barcode()—of course, one can just do

ggplot(data) +
  geom_barcode(aes(start = birth * 2, end = death * 2))

No need for a shortcut.

eashwarsoma commented 4 years ago

@rrrlw The app looks great and would be excellent as an educational tool! One thing that might help aesthetically is to keep a constant scale for the persistent diagram regardless of which rips diameter is chosen.

rrrlw commented 4 years ago

moved here

for now, ggtda won't have shiny functionality per our previous Zoom call; it will strictly remain a ggplot2 extension. However, shiny functionality would be quite valuable in an R package, so issue moved to this repo (name tentative, please suggest better ones).