mtsch / Ripserer.jl

Flexible and efficient persistent homology computation.
https://mtsch.github.io/Ripserer.jl/dev/
MIT License
66 stars 8 forks source link

Performance advice #159

Closed FedeClaudi closed 5 months ago

FedeClaudi commented 2 years ago

Hi,

Thanks for this awesome package.

I'm trying to use it on some fairly large set of data points (~10k data points of 5-10 dimensions). This is already after substantial dimensionality reduction and downsampling, but it still computationally very demanding to run the algorithm on.

image

Would you have any advice on things to try to speed it up? Would downsampling (e.g. dropping X% of the data points) would be a valid approach?

Thank you, Federico

mtsch commented 2 years ago

Hey, I didn't get the notification for this issue, sorry about that.

If you don't need it to be a Rips complex, you can try Alpha, which should work OK in the 5d case. For higher dimensions, it might take forever to build the filtration, but things should be pretty quick once that's done.

Another option is EdgeCollapsedRips, which may or might not work well. How long it takes to build depends a lot on the shape of the data. The situation there is similar - once the complex is built, things should be pretty fast, but it may take forever.

There is also a very cool algorithm for handling large data sets, but I haven't implemented it because I don't understand it. It can be found here. Maybe you can find a way to use it, it's a pretty old and unmaintained library.

On the downsampling: I have done something similar before. Below is a function that downsamples data such that each point's nearest neighbour is at least r away. This works well if your sample is uneven or if you don't care about very small structures in your data.

using NearestNeighbors
using Distances
using StaticArrays

function downsample(points::Vector{<:Union{NTuple{D,T},SVector{D,T}}}, r) where {D,T}
    points = shuffle(points)
    points_matrix = reshape(reinterpret(T, points), (D,length(points))) # KDTree wants points in a matrix

    keep = fill(true, length(points))
    tree = KDTree(points_matrix, Euclidean()) # for other metrics, you may want to use a BallTree instead

    for i in 1:length(points)
        k = keep[i]
        p = points[i]
        if k
            keep[inrange(tree, SVector(p), r, false)] .= false
            keep[i] = true # i-th point was removed, so we have to add it back
        end
    end
    return points[keep]
end

Hope this helps and let me know if you have any other questions!