rivetTDA / rivet

RIVET is a tool for Topological Data Analysis, in particular two-parameter persistent homology.
GNU General Public License v3.0
73 stars 24 forks source link

Make FI Rep Construction More Memory Efficent #122

Open mlesnick opened 6 years ago

mlesnick commented 6 years ago

The FI Rep construction and specifically, the way simplices in the bifiltration are stored, is not as memory efficient as it could be.

If we are computing homology in degree j, our code only stores simplices in dimensions j-1 (low), j (mid), and j+1 (high). The most obvious inefficiency is that the same struct is used to represent mid and high simplices. We designed things this way to keep a certain sorting function simple. But this leads to an avoidable memory efficiency, for the following reason: When working with multi-critical simplices, our algorithmic approach requires a mid simplex to be represented by a richer, larger object than a high simplex. But usually there are far more high simplices than mid simplices. So it would be better to use a representation for high simplices that is as small as possible. Fixing this will make a small but non-negligible difference in memory consumption for one-critical filtration computations (something like a factor of 2 improvement in the size of the BifiltrationData object). (However, For multicritical bifiltrations with many grades of appearance per simplex, the difference will be negligible.)

Commit 3d8bf51 in my fork makes some progress towards this change, but this still needs a lot of work.

Beyond this, to further reduce memory consumption in this part of the pipeline for the one-critical case, a natural next step would be introduce separate representations of simplices in the one-critical and multi-critical case. With the optimization above, in the current approach a high simplex would store a std::vector of bigrades. This vector would contain exactly one bigrade in the one-critical case. In an optimization for the one-critical case, a high simplex should instead just directly store the bigrade, doing away with the C++ vector. This would give another small but non-negligible improvement.

As Uli Bauer has suggested to us, using std::arrays instead of std::vectors and using smaller integer types to represent simplex indices could also help to further reduce memory consumption for this part of code.