scikit-tda / kepler-mapper

Kepler Mapper: A flexible Python implementation of the Mapper algorithm.

https://kepler-mapper.scikit-tda.org

MIT License

629 stars 182 forks source link

added contains() method to Cover class. #209

Closed pulquero closed 3 years ago

pulquero commented 3 years ago

See #208

codecov[bot] commented 3 years ago

Codecov Report

Merging #209 (8e4a94c) into master (6d983e6) will increase coverage by 0.19%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #209      +/-   ##
==========================================
+ Coverage   78.42%   78.61%   +0.19%     
==========================================
  Files          11       11              
  Lines         774      781       +7     
  Branches      159      161       +2     
==========================================
+ Hits          607      614       +7     
  Misses        138      138              
  Partials       29       29

Impacted Files	Coverage Δ
kmapper/cover.py	`88.42% <100.00%> (+0.92%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6d983e6...e351e60. Read the comment docs.

deargle commented 3 years ago

Low-effort comment warning: the semantics of contains() suggests to me that it would accept a datapoint and return a boolean if any of the nodes within any of the covers' hypercubes contains that exact node. But of course that doesn't make sense, because covers don't know about points that might be within its hypercubes. And of course, knowing which hypercube contains a datapoint doesn't on its own, reach the ultimate goal of knowing which node a point is closest to. Is the intention that, once the hypercube is known any nodes/clusters within those hypercubes count as "closest"? Hmm.

pulquero commented 3 years ago

Should I rename it to find_cubes()? Note, this PR is an incremental step towards the goal, rather than having one PR with lots of changes to review.

deargle commented 3 years ago

I like find_cubes(), @sauln?

:+1: for it being incremental. Thinking ahead, what's the algorithm?

For a datapoint, find hypercubes
for each hypercube, rerun the clustering algorithm, including the new point, return clusters that contain the new point?

The question is whether previous nodes would have remained in tact in order for them to be flagged in the previous graph.

pulquero commented 3 years ago

For each cube, find the nearest neighbors (data points that is), then lookup the clusters of the nearest neighbors (code in #208). I'm not trying to do incremental/updateable tda, but rather using the tda result to cheaply tell me something about new data.

pulquero commented 3 years ago

Or maybe find_sets (~ open set) to be shape agnostic.

deargle commented 3 years ago

Or maybe find_sets (~ open set) to be shape agnostic.

Reviewing the lit, yes, find_open_sets would be more generic. There's several other references to "cubes" in the current cover class which could be changed later. Unless we argue that covers only ever contain open sets, in which case, just find() would suffice.

sauln commented 3 years ago

Looks pretty good to me! Thanks for the contribution :D