scikit-tda / ripser.py

A Lean Persistent Homology Library for Python
http://ripser.scikit-tda.org
Other
276 stars 61 forks source link

Threshold gives unexpected result. #51

Closed mbr085 closed 5 years ago

mbr085 commented 5 years ago

Please see https://github.com/Ripser/ripser/issues/20

mbr085 commented 5 years ago

In short the problem is that the thresh option produces results that are different from the results of the classic command line ripser.

from ripser import ripser, plot_dgms
import itertools
import numpy as np

n=3
segment = [np.linspace(0,1,5)]
endpoints = [np.linspace(0,1,2)]
face = segment * (n - 1) + endpoints
vertices = []
for k in range(n):
    vertices.extend(itertools.product(*(face[k:] + face[:k])))
    coords = np.array(corners)

rips = ripser(coords, maxdim=2, thresh=.8)
plot_dgms(rips['dgms'])

exporting to a file

np.savetxt('box.txt', X=coords)

and running the Bauer version of ripser from the command line gives a different result.

ctralie commented 5 years ago

Sorry for the trouble, and thank you for the detailed report! I will have a look tomorrow

ctralie commented 5 years ago

Hello, Here's where I am so far, comparing this to GUDHI:

ripserpy_vs_gudhi

Code below. 0D and 1D persistent homology looks the same, but GUDHI is missing the 2D dot. I should note that in our ripser.py, the 2D dot is born when all of the 1D dots die. This makes sense from the point cloud you setup, so that seems correct to me. This is also the case from ripser in the command line, but the result is different, as you noted.

I'll dig into this more deeply in a few hours when I get back from some meetings. For now, here's the code I wrote:

from ripser import ripser, plot_dgms
import gudhi
import itertools
import numpy as np
import matplotlib.pyplot as plt

def convertGUDHIPD(pers):
    """
    Convert GUDHI persistence diagrams into a list of arrays
    """
    Is = {}
    for i in range(len(pers)):
        (dim, (b, d)) = pers[i]
        if not dim in Is:
            Is[dim] = []
        Is[dim].append([b, d])
    Is = [np.array(Is[dim]) for dim in Is]
    return Is

n=3
segment = [np.linspace(0,1,5)]
endpoints = [np.linspace(0,1,2)]
face = segment * (n - 1) + endpoints
vertices = []
for k in range(n):
    vertices.extend(itertools.product(*(face[k:] + face[:k])))
coords = np.array(vertices)
np.savetxt('box.txt', X=coords)

rips = ripser(coords, maxdim=2)
dgmsrips = rips['dgms']

rips_complex = gudhi.RipsComplex(points=coords,max_edge_length=1e9)
simplex_tree = rips_complex.create_simplex_tree(max_dimension=2)
dgmsgudhi = simplex_tree.persistence(homology_coeff_field=2, min_persistence=0)
dgmsgudhi = convertGUDHIPD(dgmsgudhi)

plt.figure(figsize=(10, 5))
plt.subplot(121)
plot_dgms(dgmsrips, xy_range=[-0.2, 1.5, 0, 1.5])
plt.title("Ripser")
plt.subplot(122)
plot_dgms(dgmsgudhi, xy_range=[-0.2, 1.5, 0, 1.5])
plt.show()
mbr085 commented 5 years ago

Using GUDHI you need to add simplices up to dimension 3 in order to compute homology in dimension 2. I have taken the liberty to modify your script so that it contains both the GUDHI and the ripser results. You will notice that they agree.

%matplotlib inline
from ripser import ripser, plot_dgms
import gudhi
import itertools
import numpy as np
import matplotlib.pyplot as plt
def convertGUDHIPD(pers):
    """
    Convert GUDHI persistence diagrams into a list of arrays
    """
    Is = {}
    for i in range(len(pers))[::-1]:
        (dim, (b, d)) = pers[i]
        if not dim in Is:
            Is[dim] = []
        Is[dim].append([b, d])
    Is = [np.array(Is[dim]) for dim in Is]
    return Is

n=3
segment = [np.linspace(0,1,5)]
endpoints = [np.linspace(0,1,2)]
face = segment * (n - 1) + endpoints
vertices = []
for k in range(n):
    vertices.extend(itertools.product(*(face[k:] + face[:k])))
coords = np.array(vertices)
np.savetxt('box.txt', X=coords)

rips = ripser(coords, maxdim=2)
dgmsrips = rips['dgms']

rips_complex = gudhi.RipsComplex(points=coords) #,max_edge_length=1e9)
simplex_tree = rips_complex.create_simplex_tree(max_dimension=3)
dgmsgudhi2 = simplex_tree.persistence(homology_coeff_field=2)
dgmsgudhi = convertGUDHIPD(dgmsgudhi2)

plt.figure(figsize=(10, 5))
plt.subplot(121)
plot_dgms(dgmsrips, xy_range=[-0.2, 1.5, 0, 1.5])
plt.title("Ripser")
plt.subplot(122)
plot_dgms(dgmsgudhi, xy_range=[-0.2, 1.5, 0, 1.5])
plt.title('GUDHI')
plt.show()

image

This is in contrast to the situation where a threshold is introduced:

%matplotlib inline
from ripser import ripser, plot_dgms
import gudhi
import itertools
import numpy as np
import matplotlib.pyplot as plt

def convertGUDHIPD(pers):
    """
    Convert GUDHI persistence diagrams into a list of arrays
    """
    Is = {}
    for i in range(len(pers))[::-1]:
        (dim, (b, d)) = pers[i]
        if not dim in Is:
            Is[dim] = []
        Is[dim].append([b, d])
    Is = [np.array(Is[dim]) for dim in Is]
    return Is

n=3
segment = [np.linspace(0,1,5)]
endpoints = [np.linspace(0,1,2)]
face = segment * (n - 1) + endpoints
vertices = []
for k in range(n):
    vertices.extend(itertools.product(*(face[k:] + face[:k])))
coords = np.array(vertices)
np.savetxt('box.txt', X=coords)

rips = ripser(coords, maxdim=2, thresh=1.4)
dgmsrips = rips['dgms']

rips_complex = gudhi.RipsComplex(points=coords) #,max_edge_length=1e9)
simplex_tree = rips_complex.create_simplex_tree(max_dimension=3)
dgmsgudhi2 = simplex_tree.persistence(homology_coeff_field=2)
dgmsgudhi = convertGUDHIPD(dgmsgudhi2)

plt.figure(figsize=(10, 5))
plt.subplot(121)
plot_dgms(dgmsrips, xy_range=[-0.2, 1.5, 0, 1.5])
plt.title("Ripser")
plt.subplot(122)
plot_dgms(dgmsgudhi, xy_range=[-0.2, 1.5, 0, 1.5])
plt.title('GUDHI')
plt.show()

image

ctralie commented 5 years ago

Interesting, thanks for the update! I will have a look.

By the way, if you're looking to do 2D homology efficiently in low dimensions, you might check out alpha filtrations: http://gudhi.gforge.inria.fr/doc/latest/group__alpha__complex.html

I still definitely want to fix this bug though, so thanks for pointing it out!

ctralie commented 5 years ago

@ubauer I could use your help on this. Do you have any idea why the command line version gives something completely different? (Output below) It doesn't make any sense, because it only has one connected component, but it has a bunch of 1D classes that are born at 0?

I have also noticed that in our wrapper, it seems to be a problem in the sparse matrix class. When I use a dense matrix, it seems to work correctly with all thresholds. Though in the command line version, the answer is the same regardless. Any ideas?

value range: [0,1]
sparse distance matrix with 30 points and 310/449 entries
persistence intervals in dim 0:
 [0, )
persistence intervals in dim 1:
 [0,0.25)
 [0,0.25)
 [0,0.25)
 [0,0.25)
 [0,0.25)
 [0,0.25)
 [0,0.25)
 [0,0.25)
 [0,0.25)
 [0,0.25)
 [0,0.25)
 [0,0.25)
 [0,0.25)
persistence intervals in dim 2:
 [0.25,0.5)
 [0.25,0.5)
 [0.25,0.75)
 [0.25,0.5)
 [0.25,0.5)
 [0.25,0.5)
 [0.25,0.5)
 [0.25,0.5)
 [0.25,0.5)
 [0.25,0.5)
 [0.25,0.5)
 [0,0.5)
ubauer commented 5 years ago

Hi Chris,

It seems that you didn’t specify the input format as --point-cloud. With that, you should get the expected results.

I don’t know what’s happening in the wrapper, but if i remember correctly you're not using Ripser to compute the distance matrix for a point cloud, but some extra Python code. Maybe the problem is related to that?

ctralie commented 5 years ago

Hi Uli, Thanks for the quick response. Ah okay that completely explains it with what's happening in the console.

Yeah I also thought that about the custom code, but the same bug occurs with a sparse distance matrix. Something is going wrong with how I'm interfacing with the sparse distance matrix class in ripser....

ctralie commented 5 years ago

Hi all, My sincerest apologies for the delay on this.

I believe I figured out the bug. As I said before, I managed to narrow it down to code that ran through the sparse distance matrix reduction. It was a very helpful hint that this problem didn't exist in the command line version (thank you for your detailed reporting @mbr085). So I carefully cross-referenced every sparse_distance_matrix template in ripser/ripser master with what's in our master. @ubauer somehow when I integrated your sparse matrix branch last spring, there was a line missing:

https://github.com/scikit-tda/ripser.py/commit/d6a68f4435e136b3f1ffd706010d15a03b303d9a

The result was that a single bar splits into a bunch of little bars. So for example, instead of

[0.35355339 1. ]

We'll end up with

[[0.35355338 0.5 ] [0.5 0.559017 ] [0.559017 0.61237246] [0.61237246 0.70710677] [0.70710677 0.75 ] [0.75 0.79056942] [0.79056942 0.82915622] [0.82915622 0.86602539] [0.86602539 0.90138781] [0.90138781 0.93541437] [0.93541437 1. ]]

@ubauer, does it makes sense that we would get such a result with that line missing?

@mbr085 I have confirmed that this fixes the example you provided. If you would pull on the "threshH2bugfix" branch, do a python setup.py install, and confirm that this fixes things: https://github.com/scikit-tda/ripser.py/tree/threshH2bugfix

And if you have any other examples that you found, it would be great if you could test them too. But I think it's solid now. Thanks, Chris

mbr085 commented 5 years ago

I tried to test it but saw no difference. Is there an easy way to confirm that I am running the updated version of the code?

ctralie commented 5 years ago

Try pip uninstall ripser first. I had a similar issue before.

On Tue, Nov 13, 2018 at 3:58 AM Morten Brun notifications@github.com wrote:

I tried to test it but saw no difference. Is there an easy way to confirm that I am running the updated version of the code?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/scikit-tda/ripser.py/issues/51#issuecomment-438187440, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNh5rD2iX-_y1wQUGp-wvIKz_wJAZNIks5uuonJgaJpZM4YD5rz .

sauln commented 5 years ago

If you reinstall the source from the branch now, you should be able to confirm you're running the updated code by checking the version:

python -c "import ripser; print(ripser.__version__)"
>>> 0.3.0.dev0

The output should be 0.3.0.dev0 rather than 0.3.0 as is on the master branch.

sauln commented 5 years ago

A fix has been issued to master. We'll roll out a new version to Pypi asap. If it persists for you @mbr085 , please let us know and reopen the issue.

Thank you.

mbr085 commented 5 years ago

I confirm that the issue is resolved.

ctralie commented 5 years ago

Great! Thank you so much for the detailed report! This was a subtle one because few people use H2 and when I use it I don't use thresholds

On Wed, Nov 14, 2018, 2:14 AM Morten Brun <notifications@github.com wrote:

I confirm that the issue is resolved.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/scikit-tda/ripser.py/issues/51#issuecomment-438561839, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNh5s0FFvZ0U9B6KPZB4l1Yx6hLYbXxks5uu8LtgaJpZM4YD5rz .

ubauer commented 5 years ago

Thanks to everyone! Chris, you're right, the sorting is essential, and the behavior we saw is expected.