Persistent homology - Githubissues

Quenouillaume commented 3 years ago

We add support for filtered complexes and persistent homology.

The algorithm implemented is described in 'Computing Persistent Homology' by A. Zomorodian and G. Carlsson.

Component: algebraic topology

Keywords: homology, persistent-homology

Author: Guillaume Rousseau

Branch/Commit: a70ea5b

Reviewer: John Palmieri, Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/31861

Quenouillaume commented 3 years ago

Branch: u/gh-Quenouillaume/persistent_homology

Quenouillaume commented 3 years ago

Description changed:

--- 
+++ 
@@ -1 +1,3 @@
+AUTHOR: Guillaume Rousseau

+A module for filtered complexes and persistent homology. The algorithm implemented is described in 'Computing Persistent Homology' by A. Zomorodian and G. Carlsson.

Quenouillaume commented 3 years ago

Commit: 6aa4ece

Quenouillaume commented 3 years ago

Author: Guillaume Rousseau

mkoeppe commented 3 years ago

comment:3

There's no code on this branch

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`621d748`	`formatted code for sage standards and added comments`
`c7f1f74`	`added more doc and reference to Zomorodian-Carlsson`
`2d862d6`	`corrected format for documentation`
`4c97200`	`finished up docstrings`
`a7635a9`	`added copyright`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 6aa4ece to a7635a9

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from a7635a9 to ef3b2ab

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

ef3b2ab corrected indentation in references

Quenouillaume commented 3 years ago

comment:6

Apologies for the first ticket that had no code on its branch, this is my first time using trac !

Quenouillaume commented 3 years ago

Changed keywords from none to homology, persistent-homology

mkoeppe commented 3 years ago

comment:8

I would suggest not to create a new top-level package sage.persistent_homology but rather to nest it

edd8e884-f507-429a-b577-5d554626c0fe commented 3 years ago

comment:9

I am not sure, but since this code seems already packaged at https://github.com/Quenouillaume/persil, why not let it be part of the task #31164 ? Also, some 10 years ago i saw some optimized code for persistent homology (by Edelsbrunner iirc), so i wonder whether there is room for interfacing such code within Sage ?

edd8e884-f507-429a-b577-5d554626c0fe commented 3 years ago

comment:10

See https://en.wikipedia.org/wiki/Persistent_homology#Computation

Quenouillaume commented 3 years ago

comment:11

The code at https://github.com/Quenouillaume/persil was a first draft, in particular I used my own homemade versions of the Simplex and FreeModule classes ! The code on this ticket's branch replaces those with built-in Sage functionalities.

Replying to @sagetrac-tmonteil:

I am not sure, but since this code seems already packaged at https://github.com/Quenouillaume/persil, why not let it be part of the task #31164 ? Also, some 10 years ago i saw some optimized code for persistent homology (by Edelsbrunner iirc), so i wonder whether there is room for interfacing such code within Sage ?

tscrim commented 3 years ago

comment:12

Thank you for adding this code. However, there are a few things that will need to be addressed before this can be included.

All methods and functions need to be doctested.
There isn't a need to define a sublevel of the documentation for this one additional file. Just adding it to the usual homology documentation is fine. In a related note, I would put your files under the sage/homology folder.
I wouldn't use __foo methods. There is no need to name-mangle. Just _foo is fine. (This will also make it easier to doctest; note that this does not apply to special methods like __getitem__.)

Avoid useless checks here with short-circuiting:

-if j%1000 == 0 and self._verbose:
+if self._verbose and j%1000 == 0:

Please follow PEP8 when reasonable. In particular, use Python's naming convention for variables. For example, change _numSimplices -> _num_simplices.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from ef3b2ab to 96b91f0

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`014e7d4`	`moved into homology folder`
`5243b3c`	`adapted to PEP8`
`96b91f0`	`all methods doctested`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

805ba8e almost 100% PEP8 compatible

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 96b91f0 to 805ba8e

Quenouillaume commented 3 years ago

comment:15

Added doctests for all methods, and adapted to PEP8. A few lines are still more than 80 characters long, most of them being in the doctests.

Also moved files and documentation to the homology folders.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 805ba8e to 8b7666f

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

8b7666f fixed documentation

slel commented 3 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,9 @@
-AUTHOR: Guillaume Rousseau
+We add support for filtered complexes and persistent homology.

-A module for filtered complexes and persistent homology. The algorithm implemented is described in 'Computing Persistent Homology' by A. Zomorodian and G. Carlsson.
+The algorithm implemented is described in 'Computing Persistent Homology' by A. Zomorodian and G. Carlsson.
+
+Related:
+
+- [Wikipedia: Persistent homology computation](https://en.wikipedia.org/wiki/Persistent_homology#Computation)
+- [sage-support discussion on persistent homology](https://groups.google.com/g/sage-support/c/a2MS9WALjvo)
+

jhpalmieri commented 3 years ago

comment:18

Some comments:

It might be a good idea to have ways of converting to and from simplicial complexes: for instance, if X is a filtered complex, then a method that takes a value n and returns the complex made up of all simplices with filtration at most n. Also a _simplicial_ method so that SimplicialComplex(X) works. In addition, take a simplicial complex and make it a filtered complex with everything in filtration 0.
Related to that, I was wondering if it made sense for FilteredComplex to inherit from SimplicialComplex. I don't think so, but I thought I'd ask.
Also related to that, would it make sense to have a way of modifying the filtration value for a given simplex? I would suggest renaming get_value to filtration. Maybe it could take two arguments: a required argument s, a simplex, and if that is the only argument, then return its filtration. If an optional argument v is present, then instead set its filtration to v.
Throughout, I would change "filtered complex" to "filtered simplicial complex", and I would change the name of the file and the name of the class accordingly. When I see "filtered complex," I tend to think of a filtered chain complex, and then I think that there should be spectral sequences lurking somewhere. (That would be a great task for someone to take on.) To make things unambiguous, we should have FilteredChainComplex (I hope eventually) and FilteredSimplicialComplex (from this ticket).
Regarding

    This implementation requires filtration values to be positive. Most
    features will not work with negative values.

Should we have a warning or error if the user uses negative filtration values? I would also add this comment to the docstring for the __init__ method, so that users will see if when they do FilteredComplex?.

In the __init__ method, I like the comments about _vertices, _simplices, and _degrees_dict. Can you add a similar comment about _warnings? (Or document the warnings argument.) Is degree the same as filtration value? I might change that language, at least in the docstring and names of the user-viewable variables (for example simplex_degree_list could perhaps just be renamed to simplices).

tscrim commented 3 years ago

comment:19

+1 on making this a subclass of SimplicialComplex and changing the name to FilteredSimplicialComplex.

jhpalmieri commented 3 years ago

comment:20

Replying to @tscrim:

+1 on making this a subclass of SimplicialComplex and changing the name to FilteredSimplicialComplex.

This might require a lot of work: various methods for SimplicialComplex would have to be modified to work with FilteredSimplicialComplex, like product, wedge, join, ..., unless we just forget the filtration. At least FilteredSimplicialComplex could inherit from GenericCellComplex, and maybe there could be a new intermediate class that both it and SimplicialComplex could inherit from.

tscrim commented 3 years ago

comment:21

Hmm...yea, I guess so. Well, I am okay with including this class as-is for now since it will be essentially all implementation details changed by such a refactoring. The only thing I would want would be to change compute_homology() to persistent_homology() so it is unambiguous about what type of homology is being computed. (The Betti numbers are fine because of the extra parameters.)

Quenouillaume commented 3 years ago

comment:22

I'm working on implementing all those suggestions, thank you for the very insightful feedback !

I also think that having FilteredSimplicialComplex inherit from SimplicialComplex is a bit odd, for all the reasons mentionned above, but I'm not sure it should inherit from GenericCellComplex either, since they don't really have the same homology type...

Maybe a simple first solution would be, as jhpalmieri suggests, to have a _simplicial_() method for FilteredSimplicialComplex, which returns the maximal simplicial complex associated to the filtered complex, as well as a prune(max_value) method which returns a copy of the filtered complex, where simplices of value more than max_value have been removed.

jhpalmieri commented 3 years ago

comment:23

Replying to @tscrim:

The only thing I would want would be to change compute_homology() to persistent_homology() so it is unambiguous about what type of homology is being computed. (The Betti numbers are fine because of the extra parameters.)

+1

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`fe67d65`	fixed class name, added `_simplicial_` and prune, removed negative constraint
`d2bc95f`	`removed mention of degree, fused insert and get_value into filtration`
`83b78fd`	`fixed docstring error`
`c8dede0`	`changed file name, added doc for filtration method`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 8b7666f to c8dede0

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from c8dede0 to 8bd967d

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

8bd967d PEP8

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 8bd967d to 9301aa1

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

9301aa1 added more doc

tscrim commented 3 years ago

comment:27

I don't really like the fact that compute_persistent_homology() in some sense fixes the field you are computing the homology over and that it needs to be called before getting the intervals without a meaningful error message. Plus the result is only partially cached as when you run this again, you obtain new data that overwrites the old computed data. So my proposal is to change it as follows:

Make compute_persistent_homology() hidden by _compute_persistent_homology() and then have it return the resulting intervals.
Have a cached method intervals() (perhaps call that persistence_intervals) that has field and strict as part of its key. So it will be decorated by @cached_method(key=lambda self,f,s,v: (f,s)).
Remove the now unused attributes such as intervals.

Also, why can't you just pass _filtration_dict as a list of simplicies to SimplicialComplex in _simplicial_? Related, I also don't see the need for the _simplicies list as you can get it as the keys of _filtration_dict. The sorting is only needed during the computation of the persistent homology, which you could construct in there.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 9301aa1 to 47403e2

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

47403e2 cached computation of homology, made `_simplicial_` simpler, removed some useless attributes

Quenouillaume commented 3 years ago

comment:29

Replying to @tscrim:

I don't really like the fact that compute_persistent_homology()...

I agree, I didn't know about the cached_method decorator and it can probably be very useful here. However there is a problem: since FilteredSimplicialComplex objects can have their simplices inserted or changed, the cache key needs to take into account whether the complex has changed since last time persistent homology was computed or not. I was thinking of just counting the number of times that the complex has been modified and have that also be part of the cache key, but maybe there is a simpler / less hacky solution... In any case, I think that the _compute_persistent_homology method should be cached rather than the persistence_intervals method, since it can be called when computing intervals or betti numbers.

Also, why can't you just pass _filtration_dict as a list of simplicies to SimplicialComplex in _simplicial_? Related, ...

Yes this is much simpler, thank you !

New commits:

47403e2 cached computation of homology, made `_simplicial_` simpler, removed some useless attributes

tscrim commented 3 years ago

comment:30

I would just call foo.cached_func.clear_cache() to clear the cache when the complex is mutated.

Addendum: The way you have it implemented now is effectively a memory leak since you are always adding more stuff to the cache every time you run it.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 47403e2 to bafca26

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

bafca26 simplified cache

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from bafca26 to 354b23d

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

354b23d _persistent_homology now properly returns intervals

mkoeppe commented 3 years ago

comment:33

Setting a new milestone for this ticket based on a cursory review.

tscrim commented 3 years ago

comment:34

Sorry for loosing track of this; I didn't get any notifications from trac that you pushed changes.

I think all of my and John's current comments have been taken into account, correct?

Can you provide a rebase onto the latest beta? I will continue my review at that point.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 354b23d to 98f2a1d

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

`3e39330`	fixed class name, added `_simplicial_` and prune, removed negative constraint
`66616b7`	`removed mention of degree, fused insert and get_value into filtration`
`1fd4564`	`fixed docstring error`
`09aada4`	`changed file name, added doc for filtration method`
`bfcbc93`	`PEP8`
`eb621ff`	`added more doc`
`d0811c7`	cached computation of homology, made `_simplicial_` simpler, removed some useless attributes
`16a8d9b`	`simplified cache`
`bdfd1e9`	`_persistent_homology now properly returns intervals`
`98f2a1d`	`Merge branch 'u/gh-Quenouillaume/persistent_homology' of trac.sagemath.org:sage into t/31861/persistent_homology`

Quenouillaume commented 3 years ago

comment:36

I believe I've taken all your comments into account ! And I've just finished rebasing if you wish to keep reviewing.

sagemath / sage

Persistent homology #31861