Closed senchromatic closed 3 years ago
Unit tests for the filtration module. Test 1 Points: A(-1, 0), B(1, 0), C(0, 1) Illustration: . . .
Test 2 Points: A(0, -2), B(0, -1), C (0, 1), D(0, 2) Illustration: . .
. .
Output: $python3 filtration_test.py
Test 1
Ordered points:
{0: A, 1: B, 2: C}
Distance matrix:
[[0. 2. 1.41421356]
[2. 0. 1.41421356]
[1.41421356 1.41421356 0. ]]
Pairs of indices by ordered distance:
[(1.4142135623730951, [(0, 2), (1, 2)]), (2.0, [(0, 1)])]
Diameter = 0
ASC:
{<A>; <B>; <C>}
Diameter = 1.4142135623730951
ASC:
{<A>; <B>; <C>
<A, C>; <B, C>}
Diameter = 2.0
ASC:
{<A>; <B>; <C>
<A, B>; <A, C>; <B, C>
<A, B, C>}
Test 2
Ordered points:
{0: A, 1: B, 2: C, 3: D}
Distance matrix:
[[0. 1. 3. 4.]
[1. 0. 2. 3.]
[3. 2. 0. 1.]
[4. 3. 1. 0.]]
Pairs of indices by ordered distance:
[(1.0, [(0, 1), (2, 3)]), (2.0, [(1, 2)]), (3.0, [(0, 2), (1, 3)]), (4.0, [(0, 3)])]
Diameter = 0
ASC:
{<A>; <B>; <C>; <D>}
Diameter = 1.0
ASC:
{<A>; <B>; <C>; <D>
<A, B>; <C, D>}
Diameter = 2.0
ASC:
{<A>; <B>; <C>; <D>
<A, B>; <B, C>; <C, D>}
Diameter = 3.0
ASC:
{<A>; <B>; <C>; <D>
<A, B>; <A, C>; <B, C>; <B, D>; <C, D>
<A, B, C>; <B, C, D>}
Diameter = 4.0
ASC:
{<A>; <B>; <C>; <D>
<A, B>; <A, C>; <A, D>; <B, C>; <B, D>; <C, D>
<A, B, C>; <A, B, D>; <A, C, D>; <B, C, D>
<A, B, C, D>}
Rough runtime for latest version @ max_dimension = 2: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 3 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 26 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = 94 seconds
Rough runtime for latest version @ max_dimension = 3: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 15 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 238 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = [not tested - estimate: 1800 seconds?]
These times include printing some small amount of metadata and generating the filtration, but not printing all of the ASCs for every single diameter (which takes too long and isn't practically useful IMHO).
I tried limiting the maximum diameter to 0.99, although it didn't make any perceptible difference.
(New performance benchmarks after adding in diameter sampling, with about the same number of diameters as Matt had chosen in the former version of project_first_hack.py)
Rough runtime for latest version @ max_dimension = 2: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 5 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 9 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = 17 seconds TEST_SAMPLE_SIZE = 3500, # of boxes = 152, time = 36 seconds TEST_SAMPLE_SIZE = 4000, # of boxes = 190, time = 70 seconds
Rough runtime for latest version @ max_dimension = 3: TEST_SAMPLE_SIZE = 2000, # of boxes = 44, time = 12 seconds TEST_SAMPLE_SIZE = 2500, # of boxes = 82, time = 95 seconds TEST_SAMPLE_SIZE = 3000, # of boxes = 114, time = [estimate: 12-13 minutes]
I experimented with what would happen if we don't copy the ASC to create new ones at each diameter of the filtration. It leads to a modest improvement but not by a large factor. For that reason, I'll postpone the deepcopy optimization for now.
Looks great. Merging.
(Done)
(Todo)
(Future work)