mtsch / Ripserer.jl

Flexible and efficient persistent homology computation.
https://mtsch.github.io/Ripserer.jl/dev/
MIT License
63 stars 7 forks source link

Obtaining the simplicial complex for a given threshold #172

Open davibarreira opened 3 months ago

davibarreira commented 3 months ago

First of all. Thanks for this package! I'm trying to learn a bit of TDA, and I'm trying to understand the outputs of the package. One thing I was wondering was whether it is possible to obtain the simplicial complex for a given threshold. For example, given a distance matrix and a "radius" of 0.1, I wanted to obtain the simplicial complex in order to visualize connections as the radius increased. Is it possible with the package? I saw that there is a Simplex data structure, but it does not seem to have the actual connections for the given input points.

mtsch commented 3 months ago

Hey! While you can't really construct a complex explicitly, you can enumerate the simplices in a given filtration. This is a bit hacky, but it works reliably. I'll demonstrate it with an example.

Start by creating some data and a filtration. Note that here, I've set the threshold in the filtration itself. If you want to compare different thresholds it might be better to leave it and filter the vector of simplices later (I show how to do this later).

julia> using Ripserer

# Create some data and a filtration on it
julia> data = [(rand(), rand()) for _ in 1:100]
julia> flt = Rips(data; threshold=0.1)

To get simplices, we'll use the (maybe not the most intuitively-named) Ripserer.columns_to_reduce. It takes a filtration and vector of all its d-simplices, and returns a vector of (d+1)-simplices. To get the starting point, use edges.

julia> edges = Ripserer.edges(flt)
julia> triangles = Ripserer.columns_to_reduce(flt, edges)
julia> tetrahedra = Ripserer.columns_to_reduce(flt, triangles)

You can keep doing this until you get to the dimension you're interested in. columns_to_reduce returns an iterator. Use collect to materialize it in a vector. Be careful here, these can get extremely large very quickly and could eat up all your memory.

julia> simplices = collect(tetrahedra)
25-element Vector{Simplex{3, Float64, Int64}}:
 -Simplex{3}((66, 34, 8, 1), 0.07873421536359702)
 -Simplex{3}((72, 47, 30, 2), 0.09496424105080373)
 -Simplex{3}((72, 47, 43, 2), 0.08662338431217294)
 -Simplex{3}((77, 70, 38, 6), 0.0879042651018931)
 -Simplex{3}((77, 44, 38, 6), 0.09089322446008455)
 -Simplex{3}((70, 44, 38, 6), 0.09089322446008455)
 -Simplex{3}((77, 70, 44, 6), 0.09089322446008455)
 -Simplex{3}((90, 76, 34, 8), 0.09562660381589932)
 -Simplex{3}((90, 66, 34, 8), 0.08434831089518131)
 -Simplex{3}((76, 66, 34, 8), 0.0986279317036634)
 ⋮
 -Simplex{3}((82, 51, 25, 24), 0.09448718717896652)
 -Simplex{3}((90, 76, 66, 34), 0.0986279317036634)
 -Simplex{3}((82, 71, 61, 35), 0.08840635543753256)
 -Simplex{3}((71, 65, 61, 35), 0.054352593410610384)
 -Simplex{3}((77, 70, 44, 38), 0.0625889837788639)
 -Simplex{3}((78, 75, 69, 49), 0.09192425486943887)
 -Simplex{3}((78, 74, 69, 49), 0.09709293326452263)
 -Simplex{3}((75, 74, 69, 49), 0.09709293326452263)
 -Simplex{3}((78, 75, 74, 49), 0.09709293326452263)
 -Simplex{3}((78, 75, 74, 69), 0.05476974181818138)

A simplex in Ripserer holds three pieces information: the sign, vertices and birth value. The sign returned by columns_to_reduce is arbitrary. You may want to use abs on the simplex to make it positive. You can use the vertices (or the simplex itself) to index back in to your data. As an example, lets see which tetrahedron here has the largest birth value:

julia> largest_birth_simplex = abs(argmax(birth, simplices))
3-dimensional Simplex(index=1259666, birth=0.0986279317036634):
   +(76, 66, 34, 8)

julia> [data[v] for v in Ripserer.vertices(largest_birth_simplex)]
4-element Vector{Tuple{Float64, Float64}}:
 (0.05901024586093817, 0.8981589899299338)
 (0.15531409254671724, 0.8768742999941058)
 (0.1546038616769584, 0.8956474221760998)
 (0.13120308962828497, 0.9069856484444736)

julia> data[largest_birth_simplex]
4-element StaticArraysCore.SizedVector{4, Tuple{Float64, Float64}, Vector{Tuple{Float64, Float64}}} with indices SOneTo(4):
 (0.05901024586093817, 0.8981589899299338)
 (0.15531409254671724, 0.8768742999941058)
 (0.1546038616769584, 0.8956474221760998)
 (0.13120308962828497, 0.9069856484444736)

If you want to reduce the threshold at this point, use filter like so:

julia> filter(sx -> birth(sx) < 0.09, simplices)
10-element Vector{Simplex{3, Float64, Int64}}:
 -Simplex{3}((66, 34, 8, 1), 0.07873421536359702)
 -Simplex{3}((72, 47, 43, 2), 0.08662338431217294)
 -Simplex{3}((77, 70, 38, 6), 0.0879042651018931)
 -Simplex{3}((90, 66, 34, 8), 0.08434831089518131)
 -Simplex{3}((55, 16, 15, 13), 0.08733924486750108)
 -Simplex{3}((95, 52, 46, 22), 0.0801360237769744)
 -Simplex{3}((82, 71, 61, 35), 0.08840635543753256)
 -Simplex{3}((71, 65, 61, 35), 0.054352593410610384)
 -Simplex{3}((77, 70, 44, 38), 0.0625889837788639)
 -Simplex{3}((78, 75, 74, 69), 0.05476974181818138)

As for plotting these I suggest collecting triangles and plotting the connections between all pairs in the simplex, or just collecting the edges and drawing a line for each of them.

Hope this is helpful! Let me know if you have any other questions.