Performance Testing... - Githubissues

datahead8888 commented 9 years ago

The application flow must be performance tested using a large data set (6 GB). About 16 GB of memory should run this without a problem.

datahead8888 commented 9 years ago

This tool is very powerful for Python memory analysis: http://fa.bianp.net/blog/2012/line-by-line-report-of-memory-usage/

Memory report for pre-existing code (Using the moderate sized Plume data set): Line # Mem usage Increment Line Contents = = = = = = = = = = = = = = = = = = = = = = 526 327.023 MiB 0.000 MiB @profile 527 def PCA(d): 528 327.023 MiB 0.000 MiB d2 = np.zeros((d.shape[0] * 2, 3)); 529 363.352 MiB 36.328 MiB d2[: d.shape[0], :] = copy.deepcopy(d) 530 511.973 MiB 148.621 MiB d2[d.shape[0]:, :] = copy.deepcopy(d*(-1)) 531 512.820 MiB 0.848 MiB cov_mat = np.cov([d2[:, 0], d2[:, 1], d2[:, 2]]) 532 513.145 MiB 0.324 MiB eig_val, eig_vec = np.linalg.eig(cov_mat)

533 513.145 MiB 0.000 MiB eig_pairs = [(np.abs(eig_val[i]), eig_vec[:, i]) for i in range(len(eig_val))] 534 513.145 MiB 0.000 MiB eig_pairs.sort() 535 513.145 MiB 0.000 MiB eig_pairs.reverse() 536 513.145 MiB 0.000 MiB return [eig_pairs[0][0], eig_pairs[1][0], eig_pairs[2][0]], np.array([eig_pairs[0][1], eig_pairs[1][1], eig_pairs[2][1]])

datahead8888 commented 9 years ago

Line # Mem usage Increment Line Contents = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 526 327.008 MiB 0.000 MiB @profile 527 #This function finds the Principal Components Analysis for the data (using both the original data and a second copy scaled by -1) 528 #This is used to find the principal direction s and their magnitudes 529 def PCA(d): 530 327.027 MiB 0.020 MiB scaledCopy = copy.deepcopy(d*(-1)) 531 #Here we use a concatenation of the origi nal array and a scaled version - it acts as a view to each array in Python 532 #We do not do a deep copy of the original data, since this would waste memory 533 437.652 MiB 110.625 MiB d2 = np.concatenate((d, scaledCopy)) 534 438.496 MiB 0.844 MiB cov_mat = np.cov([d2[:, 0], d2[:, 1], d2[ :, 2]]) 535 438.820 MiB 0.324 MiB eig_val, eig_vec = np.linalg.eig(cov_mat)

536 438.820 MiB 0.000 MiB eig_pairs = [(np.abs(eig_val[i]), eig_vec [:, i]) for i in range(len(eig_val))] 537 438.820 MiB 0.000 MiB eig_pairs.sort() 538 438.820 MiB 0.000 MiB eig_pairs.reverse() 539 438.820 MiB 0.000 MiB return [eig_pairs[0][0], eig_pairs[1][0], eig_pairs[2][0]], np.array([eig_pairs[0][1], eig_pairs[1][1], eig_pairs[2][1]])

xintong-ai / VecHist

Performance Testing... #31