salimandre / unsupervised-image-segmentation-persistent-homology

Unsupervised image segmentation by applying topological data analysis techniques.
28 stars 1 forks source link
covering-tree gudhi image-segmentation persistence-diagrams persistent-homology rips-vietoris-complexes tda traversal-algorithm unsupervised-learning

Unsupervised image segmentation using persistent homology theory

Topological Data Analysis

In the early of 20th century Algebraic topology provided, thanks to Poincaré, a general framework to classify shapes. Indeed the Euler characteristic equal to the alternating sum of the Betti numbers is a topological invariant. Roughly these numbers count the number of distinct objects in the domain, the number of holes and the number of voids they contain etc...

Topological Data Analysis (TDA) is the field which apply these theorical tools in order to proceed data analysis. But these latter characteristics cannot be used straight forward because of the uncertainty of the datas and because of the sensitivity of Betti numbers to minor outliers in the data set. Therefore to tackle this issue the main tool TDA is using is persistent homology, in which the invariants are in the form of persistence diagrams also called barcodes. Topological invariants will then quantify the stability of geometric features with respect to degradation such as noise or artefacts

Credits @ Frédéric Chazal, Bertrand Michel

Our Method

We used TDA framework to perform unsupervised image segmentation. The set of images provided by the skimage library from python and the python library Gudhi @INRIA to produce simplicial complexes and persistence diagrams.

The main procedure was as followed :

Please note that in practice to compute most relevant and persistent 0 homology groups we followed two more steps we don't develop:

Our Results

Note: since computation was heavy for our computer we had to use only 5000 to 10000 superpixels that is 2% to 4% of all pixels. It is interesting to see that we still managed to get decent results while TDA researchers actually use all pixels into their computation.

Here is one example of image segmentation produced using the naive procedure. It produced 250 most persistent 0th homology groups but they are irrelevant. These are isolated pixels.

Here we removed isolated pixels. We then applied our sampling method to recover labels for these as well. We have a more parsimonious segmentation with 21 segments but they are not homogeneous as a large part of image remains as one segment.

Here are our final results for 0th homology groups where we applied full procedure. We used the same parameters for all images. Results are parsimonious as we get between 10 and 30 segments and they are more homogenous.

Here are our final results for 1th homology groups where we applied full procedure. Actually we noticed that most often only the most persistent cycle may be relevant to the image. Although the main drawback with 1th homology groups is that a relevant cycle in 3D or 5D RV complex may become irrelevant when projected into 2D.

To conclude we found that topological persistence of 0 dimension elements is an effective and robust (no need of parameter tuning) method for image segmentation but more generally for unsupervised data processing. Also it is interesting to see that this method is close to Felzenswalb's algorithm. It provides a theorical framework which explains why this algorithm is so powerful. In the contrary we found that topological persistence of 1 dimension elements is not useful in the case of images due to projection issue.