mrzv / dionysus

Library for computing persistent homology
http://mrzv.org/software/dionysus2
Other
144 stars 31 forks source link

Representative cycles in zigzag persistence homology for all cycles #62

Closed vkarthik095 closed 4 months ago

vkarthik095 commented 4 months ago

Hi @mrzv

In the documentation for zigzag persistence homology, it is shown how to extract the representation of the alive cycles -

for z in zz:
    print(' + '.join("%d * (%s)" % (x.element, f[cells[x.index]]) for x in z))

Is there a way to find the representative cycles of all the cycles in the persistence diagram? A possibility is to use zz, dgms, cells = d.zigzag_homology_persistence(f, times, callback = detail) as in the documentation, but I was wondering if there was a utility that gives (birth, death, representative cycle) for the zigzag persistence homology.

Thank you in advance!

mrzv commented 4 months ago

Zigzag persistence doesn't maintain representative cycles from the past, specifically, after they died via a simplex being removed. So using that callback and accumulating them along the way is the only way to do it.

vkarthik095 commented 3 months ago

Hi @mrzv,

I've implemented a callback to track representative cycles and their birth and death, but it's not functioning as expected due to my incorrect assumption: that the representations of representative cycles don't change during their lifetime. This assumption is flawed when a simplex is being removed, and I would like to know if there are changes to the code below to consistently track cycles in Python. If there's a more efficient method, I'm open to suggestions. Below is the detail function for reference:

    alive_cycles = []
    alive_cycles_birth_index = [] # Keeps track of the birth time of alive cycles
    dgm_reps = []

    def detail(i, t, d, zz, cells):
        global previous_cycles  
        current_cycles = set() # Keeps track of the current alive cycles
        for z in zz:
            cycle = tuple([cells[x.index] for x in z])
            current_cycles.add(cycle)

        # Calculating the cycles that are created or destroyed in 
        # this step
        diff_birth = current_cycles.difference(previous_cycles)
        diff_death = previous_cycles.difference(current_cycles)

        # Add to alive cycles if a cycle is born
        for item in diff_birth:
            alive_cycles.append(item)
            alive_cycles_birth_index.append(t)

        # Store the cycles dying in dgm_reps along with
        # the birth and death.
        for item in diff_death:
            idx = alive_cycles.index(item)
            birth, death = alive_cycles_birth_index[idx], t
            dgm_reps.append([birth, death, item])
            del alive_cycles[idx]
            del alive_cycles_birth_index[idx]

        previous_cycles = current_cycles
mrzv commented 3 months ago

Sorry, I misunderstood what you were after. You want ("compatible") representatives for the entire bars. Dionysus doesn't compute that. It only gives you access to its internal representation, which is representative cycles of the individual homology groups that it tracks in order to compute the barcode. Those are not the same; you are right.

There is work on computing barcode representatives, but it's not implemented in Dionysus, and I don't know of a public implementation.

vkarthik095 commented 3 months ago

Thank you for getting back and I'm sorry for not being clear before!

As you mention Dionysus doesn't directly compute the barcode representatives. However, I was wondering if it is possible to trace the barcode representatives using the internal representation of the alive cycles after each step provided by Dionysus. I was able to track the barcode representatives using the above code for some barcodes and the incorrect representatives occurred due to case 4 (explained below).

We have the following cases as we add/delete a simplex,

  1. If a cycle survives from the previous step we keep things as they are.
  2. If there is a new cycle- we add it to the list of the alive cycles. This acts as the representative for this barcode.
  3. If a cycle from the previous step disappears - we add it to the persistence diagram with the representative.
  4. The problem occurs if a cycle 'c' in the previous step changes basis in the current step which happens only when a simplex is deleted (I think). If it is possible to identify 'c' with an alive cycle in the current step, we can keep track of the representatives in a compatible manner.

If we can consistently match the cycles between two consecutive lists of alive cycles, it could be possible to find the barcode representative using the callback. Let me know if there is a way to resolve this.

Thank you in advance!

mrzv commented 3 months ago

You are right that the main problem comes from simplex deletions, although you also need to do extra work when cycles die with simplex additions, depending on how you define things.

I'm not sure I understand your step 4. What happens is that when a simplex is removed, the entire cycle basis needs to be updated to get rid of it. So some cycle gets added to other cycles and then that cycle gets removed (this is in the case of death; birth doesn't require any special attention). The problem is that as you are changing basis at the current step, you have to go back and change the basis accordingly in every other step of the zigzag. This can get expensive (cubic per step). The unpublished work I mentioned is about how to do this more efficiently.

There is a slightly more efficient way to do this "back-fixing", but I'm not sure there is a way to get the necessary information out of Dionysus. You could implement the inefficient algorithm above, but that seems like a lot of work.

vkarthik095 commented 3 months ago

Thank you for the elaborate answer!

In case 4, the problem was about the cycle basis being updated and I wasn't sure how to do this consistently. I am not fully familiar with the details of the computation of the zigzag persistence and I was hoping that this update of cycle basis could be done in a linear/constant time but I think I understand the problem better now, thanks to your explanation.

I think I'll go with the easier option and use the public implementation once it's available.