Closed FedeClaudi closed 5 months ago
Hey, I didn't get the notification for this issue, sorry about that.
If you don't need it to be a Rips complex, you can try Alpha
, which should work OK in the 5d case. For higher dimensions, it might take forever to build the filtration, but things should be pretty quick once that's done.
Another option is EdgeCollapsedRips
, which may or might not work well. How long it takes to build depends a lot on the shape of the data. The situation there is similar - once the complex is built, things should be pretty fast, but it may take forever.
There is also a very cool algorithm for handling large data sets, but I haven't implemented it because I don't understand it. It can be found here. Maybe you can find a way to use it, it's a pretty old and unmaintained library.
On the downsampling: I have done something similar before. Below is a function that downsamples data such that each point's nearest neighbour is at least r
away. This works well if your sample is uneven or if you don't care about very small structures in your data.
using NearestNeighbors
using Distances
using StaticArrays
function downsample(points::Vector{<:Union{NTuple{D,T},SVector{D,T}}}, r) where {D,T}
points = shuffle(points)
points_matrix = reshape(reinterpret(T, points), (D,length(points))) # KDTree wants points in a matrix
keep = fill(true, length(points))
tree = KDTree(points_matrix, Euclidean()) # for other metrics, you may want to use a BallTree instead
for i in 1:length(points)
k = keep[i]
p = points[i]
if k
keep[inrange(tree, SVector(p), r, false)] .= false
keep[i] = true # i-th point was removed, so we have to add it back
end
end
return points[keep]
end
Hope this helps and let me know if you have any other questions!
Hi,
Thanks for this awesome package.
I'm trying to use it on some fairly large set of data points (~10k data points of 5-10 dimensions). This is already after substantial dimensionality reduction and downsampling, but it still computationally very demanding to run the algorithm on.
Would you have any advice on things to try to speed it up? Would downsampling (e.g. dropping X% of the data points) would be a valid approach?
Thank you, Federico