rrrlw / TDAstats

R pipeline for computing persistent homology in topological data analysis. See https://doi.org/10.21105/joss.00860 for more details.
https://rrrlw.github.io/TDAstats
GNU General Public License v3.0
37 stars 8 forks source link

calculate_homology results #23

Open SPRADA1 opened 3 years ago

SPRADA1 commented 3 years ago

Hi! When I use calculate_homology over a graph with 7 vertices (for example) I only obtain 6 features at dimension 0 and that start with a filtration weight 0, why is that? Shouldn't be 7 features? I couldn't find the reason in your guidelines or vignettes. Thanks!

rrrlw commented 3 years ago

Thank you for the question. You are correct - for a graph with 7 vertices, there should be 7 zero-dimensional features that start at a filtration weight of 0. However, the 7th feature would be redundant. Let's assume that the closest pair of points within this set of 7 are 0.1 units apart. Then, the 2 features for the closest pair (one feature per point in the pair) would both start at 0 and both end at 0.1. This generalizes to all graphs (or point clouds) - the closest pair of vertices (or points) would result in 2 identical zero-dimensional features that start at 0 and end at the distance between the vertices (or points). Since the repeated feature does not add any new information, it can be safely excluded when calculating and returning the graph's (or point cloud's) persistent homology. If you'd prefer to include this feature, you can duplicate the zero-dimensional feature with the shortest persistence in the results that calculate_homology returns.

The above is the best explanation I have about the unexpected behavior (one less zero-dimensional feature than expected). However, keep in mind that the TDAstats R package uses Ripser as the underlying calculation engine. If this answer is unsatisfactory, it can be brought up as an issue in the Ripser GitHub repo.

This is the first time I am noticing the issue you brought up. First, thank you for meticulously checking results. Second, I would like to ensure that anyone else who has the same question is able to find the answer easily in this repo. I think either the function documentation for calculate_homology or within a vignette would be appropriate. As such, I will keep this issue open until it is included somewhere in the package.

I hope this helps, @SPRADA1. Please let me know if there's anything else.

SPRADA1 commented 3 years ago

Thanks for your detailed answer! I see your point...is it related to the concept of "reduced homology"? I will write the same issue in the Ripser repo anyway so we have all the information :) Thanks again!

rosswsweet commented 3 years ago

I'm not convinced that the issue is entirely due to the two closest points. The homology class associated to one of those points will vanish, but the other does not.

I'm seeing this issue as related to the threshold. In a toy example with 12 points (see below), setting the threshold value of 5 should yield a homology class with a birth of 0 and a "death" at 5. Indeed, when this data is input in Ripser Live (live.ripser.org) with the settings "point cloud," dimensions "0 to 1" and threshold "5," the expected barcode is returned, with 12 bars for each homology class, with the longest going to the threshold value of 5. However, using TDAstats, the longest bar we get is just larger than 4.

Thank you for your work on this package!

x,y 3,4 1,3 2.4,2.8 1.2,2.5 2.3,1.6 1.22,1.48 5,1 6.5,6 7,6.4 7.4,5.6 8.5,6 8.2,5.2

rrrlw commented 3 years ago

I apologize for the late response. I will take a look shortly, thank you for bringing this up!