scikit-tda / kepler-mapper

Kepler Mapper: A flexible Python implementation of the Mapper algorithm.
https://kepler-mapper.scikit-tda.org
MIT License
624 stars 180 forks source link

Projection functions #173

Open alalehrz opened 5 years ago

alalehrz commented 5 years ago

Thanks for the effort. I was checking the lens/projection functions, I could not find the functions that are used in Singh original paper such as gaussian density or eccentricity. May I ask if these are available in the current projection set or not.

MLWave commented 4 years ago

These are not in the current project set, but contributions to the lens library are very welcome!

I think you can manually do Gaussian Kernel Density Estimation to build a lens with this code:

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.datasets import load_iris
from scipy import stats # Relevant import

data, y = load_iris().data, load_iris().target

# Transpose the data, fit Kernel Density Estimation, calculate density lens
values = data.T
kde = stats.gaussian_kde(values)
density_lens = kde(values)

fig, ax = plt.subplots(subplot_kw=dict(projection='3d'))
x, y, w, z = values
ax.scatter(x, y, z, c=density_lens)
plt.savefig("density.png")

density plot

Then use like:

graph = mapper.map(density_lens, data, cover=km.Cover(n_cubes=10))

For eccentricity I am not sure. Going by the mathematical definition here:

http://danifold.net/mapper/filters.html#mathematical-definition

Maybe someone else can contribute code for this.

I'll look at adding density estimation to the filter function/lens library, or you are free to contribute (if the example I gave works for you).

pulquero commented 3 years ago

Here, dists is distance matrix calculated by sklearn.metrics.pairwise_distances(X). This can also be re-used in the mapper with precomputed=True.


def density(dists, epsilon=0.1):
  rho = np.sum(np.exp(-dists**2/epsilon), axis=1)
  N = len(dists)
  C = np.sum(rho)/N
  return rho/C

def eccentricity(dists, p=2):
  if p == np.inf:
    return np.max(dists, axis=1)
  else:
    N = len(dists)
    E = (np.sum(dists**p, axis=1)/N)**(1/p)
    return E