An R package for using discrete Morse theory to analyze a data set using the Mapper algorithm described in:
G. Singh, F. Memoli, G. Carlsson (2007). Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition, Point Based Graphics 2007, Prague, September 2007.
To install the stable version of this R package from CRAN:
install.packages("TDAmapper", dependencies=TRUE)
To install the latest version of this R package directly from github:
install.packages("devtools")
library(devtools)
devtools::install_github("paultpearson/TDAmapper")
library(TDAmapper)
To install from Github you might need:
# The fastcluster package is not necessary. By loading the
# fastcluster package, the fastcluster::hclust() function
# automatically replaces the slower stats::hclust() function
# whenever hclust() is called.
install.packages("fastcluster")
require(fastcluster)
m1 <- mapper1D(
distance_matrix = dist(data.frame( x=2*cos(0.5*(1:100)), y=sin(1:100) )),
filter_values = 2*cos(0.5*(1:100)),
num_intervals = 10,
percent_overlap = 50,
num_bins_when_clustering = 10)
# The igraph package is necessary to view simplicial complexes
# (undirected graph) resulting from mapper1D().
install.packages("igraph")
library(igraph)
g1 <- graph.adjacency(m1$adjacency, mode="undirected")
plot(g1, layout = layout.auto(g1) )
m2 <- mapper2D(
distance_matrix = dist(data.frame( x=2*cos(1:100), y=sin(1:100) )),
filter_values = list( 2*cos(1:100), sin(1:100) ),
num_intervals = c(5,5),
percent_overlap = 50,
num_bins_when_clustering = 10)
g2 <- graph.adjacency(m2$adjacency, mode="undirected")
plot(g2, layout = layout.auto(g2) )
# sample points from two intertwined spirals
set.seed("1")
t <- runif(100, min=1, max=6.3) # theta
X <- data.frame( x = c( t*cos(t), -t*cos(t) ), y = c( t*sin(t), -t*sin(t) ) )
d <- dist(X)
plot(X[,1], X[,2])
filter <- X[,2] # height projection
num_intervals <- 10
percent_overlap <- 50
num_bins_when_clustering <- 10
m3 <- mapper1D(
distance_matrix = d,
filter_values = filter,
# num_intervals = 10, # use default
# percent_overlap = 50, # use default
# num_bins_when_clustering = 10 # use default
)
g3 <- graph.adjacency(m3$adjacency, mode="undirected")
plot(g3, layout = layout.auto(g3) )
As of July 2015, this example and the next example will only work if you have installed TDAmapper from the github repository. It should be bug free, but is still undergoing testing.
# parametrize a trefoil knot
n <- 100
t <- 2*pi*(1:n)/n
X <- data.frame(x = sin(t)+2*sin(2*t),
y = cos(t)-2*cos(2*t),
z = -sin(3*t))
f <- X
library(rgl)
plot3d(X$x, X$y, X$z)
# library(igraph)
m4 <- mapper(dist(X), f[,1], 5, 50, 5)
g4 <- graph.adjacency(m4$adjacency, mode="undirected")
plot(g4, layout = layout.auto(g4) )
m4$points_in_vertex
library(igraph)
tkplot(g4)
The networkD3 graphs have mouseover effects.
# use the github version so that vertices stay on the canvas
library(devtools)
devtools::install_github("christophergandrud/networkD3")
library(networkD3)
# parametrize a trefoil knot
n <- 100
t <- 2*pi*(1:n)/n
X <- data.frame(x = sin(t)+2*sin(2*t),
y = cos(t)-2*cos(2*t),
z = -sin(3*t))
f <- X
m5 <- mapper(dist(X), f, c(3,3,3), c(30,30,30), 5)
g5 <- graph.adjacency(m5$adjacency, mode="undirected")
plot(g5, layout = layout.auto(g5) )
tkplot(g5)
# create data frames for vertices and edges with the right variable names
MapperNodes <- mapperVertices(m5, 1:dim(f)[1] )
MapperLinks <- mapperEdges(m5)
# interactive plot
forceNetwork(Nodes = MapperNodes, Links = MapperLinks,
Source = "Linksource", Target = "Linktarget",
Value = "Linkvalue", NodeID = "Nodename",
Group = "Nodegroup", opacity = 0.8,
linkDistance = 10, charge = -400)
These are some notes by the package author to himself about the package creation process. Locally installing, building, and checking the package can be done with:
R>setwd("C:/research/TDAmapper")
R>devtools::document()
C:\research>"C:\Program Files\R\R-devel\bin\x64\R.exe" CMD build TDAmapper
C:\research>"C:\Program Files\R\R-devel\bin\x64\R.exe" CMD check TDAmapper_1.0.tar.gz --as-cran
C:\research>"C:\Program Files\R\R-3.2.0\bin\R.exe" CMD INSTALL TDAmapper_0.0.0.9000.tar.gz
C:\research>"C:\Program Files\R\R-3.2.0\bin\R.exe" CMD Rd2pdf TDAmapper
The mapper1D function by Paul Pearson is a cleaned-up, modified, and ported version of the Mapper code by Daniel Muellner and Gurjeet Singh originally written for Matlab. What follows is the copyright notice included in the Matlab code written by Muellner, which was based on code written by Singh.
This is a cleaned-up and modified version of the Mapper code by
Gurjeet Singh. It also corrects two bugs which are present in the
original Mapper code.
(c) 2010 Daniel Muellner, muellner@math.stanford.edu
Copyright: As far as Daniel Muellner's contributions are concerned,
this code is published under the GNU General Public License v3.0
(see http://www.gnu.org/licenses/gpl.html). For scientific citations,
please refer to my home page http://math.stanford.edu/~muellner. If
you visit this page in the future, chances are high that you will find
a Python library with improved, largely extended and freely
distributable Mapper code there.
Since the present code is based on Gurjeet Singh's original code,
please also respect his copyright message.
Below is the original copyright message:
Mapper code -- (c) 2007-2009 Gurjeet Singh
This code is provided as is, with no guarantees except that
bugs are almost surely present. Published reports of research
using this code (or a modified version) should cite the
article that describes the algorithm:
G. Singh, F. Memoli, G. Carlsson (2007). Topological Methods for
the Analysis of High Dimensional Data Sets and 3D Object
Recognition, Point Based Graphics 2007, Prague, September 2007.
Comments and bug reports are welcome. Email to
gurjeet@stanford.edu.
I would also appreciate hearing about how you used this code,
improvements that you have made to it, or translations into other
languages.
You are free to modify, extend or distribute this code, as long
as this copyright notice is included whole and unchanged.