senchromatic / topological-data-analysis

Applied topology using abstract simplicial complexes
0 stars 0 forks source link

Filtration of ASC #27

Closed senchromatic closed 3 years ago

senchromatic commented 3 years ago

(Done)

(Todo)

(Future work)

senchromatic commented 3 years ago

Unit tests for the filtration module. Test 1 Points: A(-1, 0), B(1, 0), C(0, 1) Illustration: . . .

Test 2 Points: A(0, -2), B(0, -1), C (0, 1), D(0, 2) Illustration: . .

. .

Output: $python3 filtration_test.py

Test 1

Ordered points:
{0: A, 1: B, 2: C}

Distance matrix:
[[0.         2.         1.41421356]
 [2.         0.         1.41421356]
 [1.41421356 1.41421356 0.        ]]

Pairs of indices by ordered distance:
[(1.4142135623730951, [(0, 2), (1, 2)]), (2.0, [(0, 1)])]

Diameter = 0
ASC:
{<A>; <B>; <C>}

Diameter = 1.4142135623730951
ASC:
{<A>; <B>; <C>

<A, C>; <B, C>}

Diameter = 2.0
ASC:
{<A>; <B>; <C>

<A, B>; <A, C>; <B, C>

<A, B, C>}

Test 2

Ordered points:
{0: A, 1: B, 2: C, 3: D}

Distance matrix:
[[0. 1. 3. 4.]
 [1. 0. 2. 3.]
 [3. 2. 0. 1.]
 [4. 3. 1. 0.]]

Pairs of indices by ordered distance:
[(1.0, [(0, 1), (2, 3)]), (2.0, [(1, 2)]), (3.0, [(0, 2), (1, 3)]), (4.0, [(0, 3)])]

Diameter = 0
ASC:
{<A>; <B>; <C>; <D>}

Diameter = 1.0
ASC:
{<A>; <B>; <C>; <D>

<A, B>; <C, D>}

Diameter = 2.0
ASC:
{<A>; <B>; <C>; <D>

<A, B>; <B, C>; <C, D>}

Diameter = 3.0
ASC:
{<A>; <B>; <C>; <D>

<A, B>; <A, C>; <B, C>; <B, D>; <C, D>

<A, B, C>; <B, C, D>}

Diameter = 4.0
ASC:
{<A>; <B>; <C>; <D>

<A, B>; <A, C>; <A, D>; <B, C>; <B, D>; <C, D>

<A, B, C>; <A, B, D>; <A, C, D>; <B, C, D>

<A, B, C, D>}
senchromatic commented 3 years ago

Rough runtime for latest version @ max_dimension = 2: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 3 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 26 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = 94 seconds

Rough runtime for latest version @ max_dimension = 3: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 15 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 238 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = [not tested - estimate: 1800 seconds?]

These times include printing some small amount of metadata and generating the filtration, but not printing all of the ASCs for every single diameter (which takes too long and isn't practically useful IMHO).

I tried limiting the maximum diameter to 0.99, although it didn't make any perceptible difference.

senchromatic commented 3 years ago

(New performance benchmarks after adding in diameter sampling, with about the same number of diameters as Matt had chosen in the former version of project_first_hack.py)

Rough runtime for latest version @ max_dimension = 2: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 5 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 9 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = 17 seconds TEST_SAMPLE_SIZE = 3500, # of boxes = 152, time = 36 seconds TEST_SAMPLE_SIZE = 4000, # of boxes = 190, time = 70 seconds

Rough runtime for latest version @ max_dimension = 3: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 12 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 95 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = [estimate: 12-13 minutes]

senchromatic commented 3 years ago

I experimented with what would happen if we don't copy the ASC to create new ones at each diameter of the filtration. It leads to a modest improvement but not by a large factor. For that reason, I'll postpone the deepcopy optimization for now.

aepereira commented 3 years ago

Looks great. Merging.