scikit-tda / kepler-mapper

Kepler Mapper: A flexible Python implementation of the Mapper algorithm.
https://kepler-mapper.scikit-tda.org
MIT License
623 stars 180 forks source link

Mapper long term direction #11

Closed sauln closed 6 years ago

sauln commented 6 years ago

I'd like to talk about future directions of kepler-mapper and some work I'd like to do. Before I get too far ahead of myself, I want to make sure you (@MLWave) agree with the directions so a permanent fork won't be necessary.

Immediate steps are

Do you have a vision or direction for kepler-mapper? There is considerable new research on the method and I think kepler-mapper would be a great platform to introduce some of these ideas.

MLWave commented 6 years ago

I don't have a vision for this. Kepler-Mapper was created as just a personal learning project, but it kind of blew up in popularity.

I'd absolutely love to see the introduction of those 3 ideas. Especially multi-scale mapping (and generating barcodes) would have my interest.

Getting Kepler-mapper onto Pypi and better testing sounds really great. The project deserves a better programmer for this.

I'll give you full edit capabilities, so you can push and pull like you please.

MLWave commented 6 years ago

As full co-author, maybe we should have a look updating/removing things like the Disclaimer? Anything else we should look at?

Aside: I'll write documentation and examples for everything. I'll work on the output (have a mode for paper-friendly output, custom color functions).

I'm preparing a proper spec for a ML algorithm based on Mapper: Treat the nodes of a random set of networks (random scales, random functions, random clusterings) as leaf nodes in a decision tree, then combine these in an ensemble with a weighted average (something like a Topological Random Forest). I already notice the current API not being adequate for this (for one, it misses transforming unseen data into a network node). But it is a very interesting algorithm, because it works with both supervised and unsupervised lenses and can beat very powerful models (since a single lens can be a MLP or XGBoost). In the end: Training another set of models on the "leaf node" representations works even better, so it may better classify as a data preprocessing technique.

I locally tried Vietoris-Rips complexes instead of a cubical covering, but my implementation is extremely slow (and I lack the formal maths education required to effectively work with these things), so there is a risk I may not be able to follow what you are doing (you seem to be a more experienced programmer and mathematician). As long as you realize this, it is not a problem.

sauln commented 6 years ago

I think the disclaimer is fine. I'll keep my eyes out for anything to change.

First off, I am stoked about this implementation. I love that it implements the scikit-learn interface, and because of that it is a great platform build up. I am a graduate student studying Mapper currently. I'd rather not build a new competing tool and instead focus on strengthening the tools that are available. I think this would be the best place to implement mapper advances so they are accessible

Paper friendly output of the graph output? That sounds awesome! Have you used Jinja2 before? It might be helpful to use a template engine for the html.

I'm not sure if I'm following how the new ML algorithm would work. I'm interested in learning more. Is there a preliminary writeup? It looks like you have a much better grasp of numpy, scikit-learn and industry usecases than I do, so it will be a learning process for me too. :100:

My immediate goals are to cut up the main map method into a few extra helper objects that provide more customizability:

Would either of these objects help the ML algorithm you're thinking of?

I'm going to submit a new issue for my work on 'packagizing' kepler-mapper. A merge would force some cascading changes to documentation and examples, so feedback would be greatly appreciated!

Thank you,

MLWave commented 6 years ago

I'll add the previous version of km.py to the depricated directory, because I forgot about a valuable use-case of KeplerMapper: Use for teaching. Very short, simple, well-documented code (without current concessions for speed/optimization/modularity) can help students understand the mapper algorithm better.