jakipatryk / spark-persistent-homology

(WIP, not fast enough for any production usage yet) Library for persistent homology computations in Apache Spark.
Apache License 2.0
0 stars 0 forks source link

Don't use join in VR filtration generation #33

Closed jakipatryk closed 1 year ago

jakipatryk commented 1 year ago

Current implementation of Vietoris-Rips filtration is only a POC, and it is extreamly inefficient. To generate the filtration, cross joins are used (to find all combinations), which is a very very bad implementation.

The task is to remove usage of joins at all. It should be possible to achieve by computing a distance matrix first, which is a matrix of all pairwise distances of points, then broadcasting this matrix to each Spark executor, and using that matrix, instead of joining all points to generate a combination, map combination cardinal number to a set of points that represent that combination and compute max distance from distance matrix among these points.