rivetTDA / rivet

RIVET is a tool for Topological Data Analysis, in particular two-parameter persistent homology.
GNU General Public License v3.0
73 stars 24 forks source link

RIVET plots of metric data #139

Closed delooper closed 5 years ago

delooper commented 5 years ago

When I ask rivet to compute the persistent homology of a finite metric space with a secondary parameter I notice one odd feature. In the RIPS complex direction (in the line selection window, let me call this parameter epsilon) the minimum epsilon value chosen appears to be the minimum distance between all distinct pairs of points in my point cloud.

Wouldn't it be more natural to start at epsilon=0? Then along the epsilon=0 line you would see the number of points in your point cloud. With the way RIVET is set up right now, if I compute the PH of a 400-point point cloud, the min(epsilon) line often has ranks much smaller than 400, even when the secondary parameter includes all the points.

mlesnick commented 5 years ago

Hi Ryan, my understanding is that when no coarsening is done, RIVET generally takes the minimum parameter value to be the minimum among that of all simplices in the filtration. In this case, that minimum should be 0, since vertices are understood to be born at scale parameter 0. However, when coarsening is used, RIVET rounds upward https://rivet.readthedocs.io/en/latest/preliminaries.html#coarsening-a-persistence-module, so that the awkwardness of rounding non-zero distances to 0 (which you would see if you rounded downward) is avoided. This means that when coarsening is done, you may see a non-zero minimum distance.

Is this what you are seeing? We were also concerned about this phenomena, and I suppose still are, though we haven't thought actively about it recently. One alternative strategy for the binning made it into an issue https://github.com/rivetTDA/rivet/issues/84 as a todo, but was never implemented. I think the solution proposed in the comments there is probably the right way to go.

On Sat, Oct 6, 2018 at 9:50 PM Ryan Budney notifications@github.com wrote:

When I ask rivet to compute the persistent homology of a finite metric space with a secondary parameter I notice one odd feature. In the RIPS complex direction (in the line selection window, let me call this parameter epsilon) the minimum epsilon value chosen appears to be the minimum distance between all distinct pairs of points in my point cloud.

Wouldn't it be more natural to start at epsilon=0? Then along the epsilon=0 line you would see the number of points in your point cloud. With the way RIVET is set up right now, if I compute the PH of a 400-point point cloud, the min(epsilon) line often has ranks much smaller than 400, even when the secondary parameter includes all the points.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rivetTDA/rivet/issues/139, or mute the thread https://github.com/notifications/unsubscribe-auth/AL-lBOs78mZNEDAvxXq7i2irG0RulZBQks5uiV3mgaJpZM4XLooN .

delooper commented 5 years ago

Thanks. Yes I'm coarsening.

I'll try re-running my computations without coarsening but I was having some memory issues, so it might not work.

delooper commented 5 years ago

Okay, I see why I was coarsening. My laptop is frozen by swapping.

delooper commented 5 years ago

If I make the plot using rivet_python (calling rivet_console with the --betti option) I can plot the epsilon=0 line, even with coarsening. This suits my needs better than using the RIVET GUI.

mlesnick commented 5 years ago

For data sets of non-negligible size, we do expect the RIVET augmented arrangement computation to eat too much memory if it is run without coarsening. For a computation run with the --betti or --minpres flags, the situation should be better. However, see issue #104.

delooper commented 5 years ago

I don't understand the internals well-enough I suppose but I have found one thing to be curious. If I use no coarsening RIVET tends to use much more memory compared to using an extremely fine mesh with coarsening. i.e. like -x 400 -y 400 compared with no coarsening at all.

I imagine what's going on is that my data perhaps fairly irregular set of pairs of distances. So the grid (with no coarsening) is likely enormous. Perhaps more like -x 40000 -y 40000

mlesnick commented 5 years ago

That's consistent what I'd expect. You usually get a very large grid with no coarsening, and that leads to huge line arrangements. Did you also have memory problems with no coarsening for your examples even using --Betti?

mlesnick commented 5 years ago

This is Issue #84.