Closed mfleduc closed 3 years ago
Project_v2 is old code that I never updated, I just started using project_kuroshio. And the print statement was because at one point I was getting weird errors that made it seem like numpy wasn't loading.
As for your suggestion with sym_kl, we probably should do that but I just wanted to get something that worked. -Matt
On Sun, Apr 25, 2021 at 9:36 PM senchromatic @.***> wrote:
@.**** approved this pull request.
Thank you for sharing the new version. Is project_v2.py based on project_kuroshio.py?
In metrics.py https://github.com/senchromatic/topological-data-analysis/pull/35#discussion_r619919259 :
@@ -21,13 +21,18 @@ def close_enough(a, b, metric):
Kullback-Leibler divergence, modified to be symmetric
Input: two CDFs of identical shape, e.g. generated from the ecdf function
Note: This is a semi-metric as it doesn't satisfy the triangle inequality
-def sym_kl(cdf1 , cdf2, dx): +def sym_kl(cdf1 , cdf2):
Thanks for patching up this function!
Here's an optimization suggestion: It looks like we're reading in from file each time this function is called. Could we store a local variable (initialized to None), and check whether it's already initialized before loading the data?
In project_first_hack.py https://github.com/senchromatic/topological-data-analysis/pull/35#discussion_r619919453 :
# TODO: investigate why masked_cdfs returned by compute_boxed_cdfs has 1 extra dimension compared to local variable in function
masked_cdfs = masked_cdf[:, 0, :]
+
depths = np.genfromtxt('depths.csv')
coordvals = np.genfromtxt( 'C:/Users/Matt/Desktop/Masters coursework/topology/project/results/sca depth/2000 pts/'
+'boxcoords.csv', delimiter = ',')
- #
masked_latitudes = coordvals[0,:]
masked_longitudes = coordvals[1,:]
- print('There are ' + str(len(masked_latitudes)) + ' lat/lon boxes')
+1 Very useful... I had thought of printing this earlier but forgot haha
In project_v2.py https://github.com/senchromatic/topological-data-analysis/pull/35#discussion_r619919601 :
@@ -0,0 +1,244 @@ +## LeDuc, Pereira, Zhang +# This is a first hack at working with the project data using the KL divergence to measure the distance +# between two probability distributions of the depth of minimum sound speed. +import numpy as np +import pandas as pd +import pylab as pl # This gets used a lot I promise +from abstract_simplicial_complex import Point, Simplex, vietoris_rips +from metrics import ks_test +from random import sample, seed +from scipy.interpolate import interp1d +from statfuncs import ecdf +print('AAAAAAAAAA')
?
In project_v2.py https://github.com/senchromatic/topological-data-analysis/pull/35#discussion_r619919855 :
@@ -0,0 +1,244 @@ +## LeDuc, Pereira, Zhang
If this file contains the same functions as the other one, could we import the functions to minimize code duplication (for sake of maintenance)?
In project_v2.py https://github.com/senchromatic/topological-data-analysis/pull/35#discussion_r619920562 :
- geographic_names = generate_geographic_names(masked_latitudes, masked_longitudes)
- point_cloud = create_point_cloud(geographic_names, masked_cdfs, ks_test)
- a = MIN_SIGNIFICANCE_LEVEL
- c_a = np.sqrt(-np.log(a/2)*0.5)
Value the metric needs to exceed to reject the ks test null hypothesis at the given significance level
- critical_value = c_a * np.sqrt(2 / len(depths))
- dr = 0.01
- radii = np.arange(minRadius, 0.46, dr)
After ~ r = 0.29 the KS test rejects F_1=F_2 at the .05 level. Do we care much about what happens past there?
At that point the dists are statistically disimilar so grouping them may not be meaningful.
First pass it appears that all the cool stuff happens around there
so maybe we do
- homologies = np.zeros( [2, len(radii)] )
- for rndx in range(len(radii)):#want the index so we can store the dims of homologies
Why not use filtration instead of vietoris_rips?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/senchromatic/topological-data-analysis/pull/35#pullrequestreview-644210482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS4DJGA2KAXNRAMXRNBEWWDTKS7QLANCNFSM43R3HLJQ .
Updated KL divergence: Need to take a derivative and the points are not evenly spaced, so I changed the code to make that be accurate.