mrzv / dionysus

Library for computing persistent homology
http://mrzv.org/software/dionysus2
Other
147 stars 31 forks source link

Wasserstein Distance for Diagrams Seems to Hang #6

Closed tgebhart closed 7 years ago

tgebhart commented 7 years ago

I'm trying to compute the Wasserstein distance between two persistence diagrams computed using the Dionysus library. I realize that the Wasserstein computation is actually part of the hera library, but I know @mrzv worked on that code as well, so hopefully this is an acceptable place to post this issue.

When computing the Wasserstein distance, I find that the computation hangs during the auction phase. I dug into the hera code a bit, and it looks like runAuctionPhase() (line here) never empties its unassignedBidders object. Experimenting with different parameter configurations (delta, epsilon, etc.) I've found that I can get unassignedBidders.size() to 2, but no lower. I would think it has something to do with the replacement of unassignedBidders but I'm struggling to understand the auction algorithm enough to diagnose the problem completely. Any thoughts on this?

My diagrams look something like this, so their unusual structure may be causing problems:

image

I generate the diagrams via:

Fltr ff;
...
Field q;
Persistence persistence(q);
StandardReduction<Persistence> reduce(persistence);
reduce(ff);

auto dgmsf = init_diagrams(persistence, ff, [&](const Smplx& s) -> float {  return s.data(); },
                    [](Persistence::Index i) { return i; });

where Fltr is a Filtration<Simplex> type and Field is of Q<float>. I then compare the diagrams like:

double wasserstein = geom_ws::wassersteinDist(dgmsf[0], dgmss[0], 2.0);

but I've also tried different configurations of epsilon, delta, etc. and find the hanging persists. Any ideas you have on this issues would be greatly appreciated.

Also, thanks for the excellent rewrite. I've found it pretty easy to migrate to this second version from Dionysus 1. As well, computing Wasserstein distances in Dionysus 1 was not fun performance-wise, so I look forward to using this version.

mrzv commented 7 years ago

First of all, Q<float> fascinates me. What is that supposed to represent? (And a question to self: why does that work at all?)

I'll look into this problem, but I'm going to need a way to reproduce it. Can you share the diagrams that you are comparing? Alternatively, can you share the code that produces them?

tgebhart commented 7 years ago

Wow I didn't even notice that (clearly). Seems to not make a difference whether or not <float> is included.

Are CSV files alright? If so, here are two diagrams to try (birth,death).

diagrams.zip

mrzv commented 7 years ago

CSV works. Thanks for the diagrams. We (other authors of Hera and I) will look into what's going on.

Do you get the same diagrams with Q<float> and Q<>? If so, that's kind of amazing. I never planned for Q<...> to be used by itself at all, but I guess there is no reason why it can't be.

mrzv commented 7 years ago

Wait, the two diagrams in the zip, persistence_diagram{1,2}.csv, are identical. Is that on purpose?

tgebhart commented 7 years ago

No that's my fault. Try these: second_try.zip

To answer your previous question, changing Q<> to Q<float> does not affect the diagrams.

mrzv commented 7 years ago

Thanks. I'll take a look.

mrzv commented 7 years ago

Ok, it seems the main problem are the points at infinity in your diagrams. It's something we'll need to fix in Hera, but for now you can just remove them manually and run the computation on what remains. You can add the distance between the points at infinity by hand, since they can only be matched to each other.

There is an unrelated issue with your diagrams. They have points both above and below the diagonal. If that's by design (e.g., you are computing extended persistence), then it's fine. But if not, then it most likely indicates a bug in how the filtration is setup.

tgebhart commented 7 years ago

Okay sounds good. Thank you for looking into it. I should be able to work around that now that I know what's generating the issue.

I am aware of the below-diagonal points. Thank you for checking in on that though.

mrzv commented 7 years ago

The points at infinity issue should be fixed in 2eaaceb8d97303d070d71d21b810319ffb78705c (after a Hera update).