Filtration of ASC - Githubissues

senchromatic commented 3 years ago

(Done)

Added filtration class
Unit tests on synthetic examples
Tested on real data (small sample)
Face/parent simplex counts optimization
Diameter sampling: partition by minimum number of edges or pass in custom list (requires merging edge lists across a subsequence of diameter thresholds)

(Todo)

Runtime profile on filtration computation
Document asymptotic complexities

(Future work)

Persistent homologies
Birth/death plots

senchromatic commented 3 years ago

Unit tests for the filtration module. Test 1 Points: A(-1, 0), B(1, 0), C(0, 1) Illustration: . . .

Test 2 Points: A(0, -2), B(0, -1), C (0, 1), D(0, 2) Illustration: . .

. .

Output: $python3 filtration_test.py

Test 1

Ordered points:
{0: A, 1: B, 2: C}

Distance matrix:
[[0.         2.         1.41421356]
 [2.         0.         1.41421356]
 [1.41421356 1.41421356 0.        ]]

Pairs of indices by ordered distance:
[(1.4142135623730951, [(0, 2), (1, 2)]), (2.0, [(0, 1)])]

Diameter = 0
ASC:
{<A>; <B>; <C>}

Diameter = 1.4142135623730951
ASC:
{<A>; <B>; <C>

<A, C>; <B, C>}

Diameter = 2.0
ASC:
{<A>; <B>; <C>

<A, B>; <A, C>; <B, C>

<A, B, C>}

Test 2

Ordered points:
{0: A, 1: B, 2: C, 3: D}

Distance matrix:
[[0. 1. 3. 4.]
 [1. 0. 2. 3.]
 [3. 2. 0. 1.]
 [4. 3. 1. 0.]]

Pairs of indices by ordered distance:
[(1.0, [(0, 1), (2, 3)]), (2.0, [(1, 2)]), (3.0, [(0, 2), (1, 3)]), (4.0, [(0, 3)])]

Diameter = 0
ASC:
{<A>; <B>; <C>; <D>}

Diameter = 1.0
ASC:
{<A>; <B>; <C>; <D>

<A, B>; <C, D>}

Diameter = 2.0
ASC:
{<A>; <B>; <C>; <D>

<A, B>; <B, C>; <C, D>}

Diameter = 3.0
ASC:
{<A>; <B>; <C>; <D>

<A, B>; <A, C>; <B, C>; <B, D>; <C, D>

<A, B, C>; <B, C, D>}

Diameter = 4.0
ASC:
{<A>; <B>; <C>; <D>

<A, B>; <A, C>; <A, D>; <B, C>; <B, D>; <C, D>

<A, B, C>; <A, B, D>; <A, C, D>; <B, C, D>

<A, B, C, D>}

senchromatic commented 3 years ago

Rough runtime for latest version @ max_dimension = 2: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 3 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 26 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = 94 seconds

Rough runtime for latest version @ max_dimension = 3: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 15 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 238 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = [not tested - estimate: 1800 seconds?]

These times include printing some small amount of metadata and generating the filtration, but not printing all of the ASCs for every single diameter (which takes too long and isn't practically useful IMHO).

I tried limiting the maximum diameter to 0.99, although it didn't make any perceptible difference.

senchromatic commented 3 years ago

(New performance benchmarks after adding in diameter sampling, with about the same number of diameters as Matt had chosen in the former version of project_first_hack.py)

Rough runtime for latest version @ max_dimension = 2: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 5 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 9 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = 17 seconds TEST_SAMPLE_SIZE = 3500, # of boxes = 152, time = 36 seconds TEST_SAMPLE_SIZE = 4000, # of boxes = 190, time = 70 seconds

Rough runtime for latest version @ max_dimension = 3: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 12 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 95 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = [estimate: 12-13 minutes]

senchromatic commented 3 years ago

I experimented with what would happen if we don't copy the ASC to create new ones at each diameter of the filtration. It leads to a modest improvement but not by a large factor. For that reason, I'll postpone the deepcopy optimization for now.

aepereira commented 3 years ago

Looks great. Merging.

senchromatic / topological-data-analysis

Filtration of ASC #27