mrzv / dionysus

Library for computing persistent homology
http://mrzv.org/software/dionysus2
Other
144 stars 31 forks source link

Wasserstein hangup, same number of points at infinity #39

Open jacleveland opened 4 years ago

jacleveland commented 4 years ago

dgm1.txt dgm2.txt My program is running a q=5 wasserstein but it keeps getting stuck for these specific diagrams, there are other diagrams I do where they do not get stuck, any ideas? The txt files are comma separated, each line is birth, death "\n"

mrzv commented 4 years ago

The issue is almost surely with the high multiplicity of the points in both diagrams. You could try manually perturbing them by a tiny epsilon and see if that solves the problem.

For a proper diagnosis and solution, we need @grey-narn.

jacleveland commented 4 years ago

Yeah unfortunately I'm not sure exactly if I can get rid of the multiplicity, I am trying use persistence to compute a distance between artificial neural networks, but we are using a home brewed filtration and it only really works currently for 0th homology, so it is basically just persistence for a weighted graph. The multiplicity is because the neural network architecture we're studying right this second is convolutional networks. Ours has 25 weights in the filter and these are basically pasted all over the graph, so there should be 25 groups of multiplicities.

Anyway the issue never came up on the computer I used this summer, I have a different setup I'm using now.

Thanks for your time anyway, I am really impressed with the package overall.

jacleveland commented 4 years ago

Any way the package could be written to utilize cuda? Or has anyone done that before? Seems to me like the wasserstein computations could be pretty parallel.

mrzv commented 4 years ago

Well, you can always manually perturb the diagrams by epsilon. This will fix the multiplicity problem and will introduce arbitrarily small error to the Wasserstein distance.

As for parallelization, we investigated it a couple of summers ago, but it's not nearly as simple as it sounds. There are long serial bottlenecks in the algorithm that we are using.

anigmetov commented 4 years ago

Interesting. I ran Hera code with q = 5 on the attached diagrams and it worked for default relative error 0.01. I understand that these files were generated from a python code, right? @mrzv Does Dionysus use float or double when calling Hera?

вс, 15 сент. 2019 г., 19:57 Dmitriy notifications@github.com:

The issue is almost surely with the high multiplicity of the points in both diagrams. You could try manually perturbing them by a tiny epsilon and see if that solves the problem.

For a proper diagnosis and solution, we need @grey-narn https://github.com/grey-narn.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mrzv/dionysus/issues/39?email_source=notifications&email_token=AALP2ZF43KEWHDJ7BZRUGALQJZZO5A5CNFSM4IW3MSU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XVXWI#issuecomment-531586009, or mute the thread https://github.com/notifications/unsubscribe-auth/AALP2ZFRSXMF5IJV4GZV2O3QJZZO5ANCNFSM4IW3MSUQ .

mrzv commented 4 years ago

Dionysus uses float for simplex data, so the points in the diagram have float coordinates.

anigmetov commented 4 years ago

@mrzv If I replace double with float, my executable gets stuck as well. It seems to be the familiar numerical issue: the epsilon in epsilon-scaling gets too small to guarantee the increase of prices, and auction makes no progress.

mrzv commented 4 years ago

@grey-narn So what's the right strategy? If I switch simplex data to double, then the coordinates of the points in the diagram become double, and I'm guessing we can trigger numerical issues again, even if not with this input. I can keep everything as float in Dionysus, but convert it to double when passing to Hera. Are we guaranteed to avoid numerical issues then? I'm not sure if we ever figured this out. Do you know?

anigmetov commented 4 years ago

Could we do something similar to numpy, where functions take additional dtype argument? By default it uses float, but we can also support double and long double, with the same names as in numpy. Of course, there will always be cases where even long double will not save us from this issue. If this is a simple change, it still can be worth implementing.

On Mon, 16 Sep 2019 at 19:23, Dmitriy notifications@github.com wrote:

@grey-narn https://github.com/grey-narn So what's the right strategy? If I switch simplex data to double, then the coordinates of the points in the diagram become double, and I'm guessing we can trigger numerical issues again, even if not with this input. I can keep everything as float in Dionysus, but convert it to double when passing to Hera. Are we guaranteed to avoid numerical issues then? I'm not sure if we ever figured this out. Do you know?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mrzv/dionysus/issues/39?email_source=notifications&email_token=AALP2ZDJTJK7LM3HAJP3NGTQJ66HJA5CNFSM4IW3MSU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6Z4GEI#issuecomment-531874577, or mute the thread https://github.com/notifications/unsubscribe-auth/AALP2ZH7G5JBYNEJLQTZZMDQJ66HJANCNFSM4IW3MSUQ .

-- Best regards, Arnur Nigmetov

anigmetov commented 4 years ago

I just realized what you were asking. Well, having input in float and doing computations in double cannot help in all situations, because in the equation for epsilon we need to divide by the number of bidders and multiply with the cost of the current matching. An example that requires an arbitrary small value of epsilon can be constructed by taking two diagrams of high cardinality, but having a small distance to each other. Remember that if we want q>1, then we cannot remove duplicate points, so we can just take two points which are very close to each other and add as many duplicate points as we want.

mrzv commented 4 years ago

@grey-narn Yeah, the general solution would require a much more complex yoga with exact predicates and such. We are not going there any time soon. I guess the real question before me is whether to switch all simplex data to double, or keep it as float, but convert to double when passing to Hera. Neither solution is perfect, it's just the question of what makes the most sense. I'm leaning towards the latter, since it's the least disruptive.