Closed nathanieljevans closed 4 years ago
I suspect it's because there are more observations than allowed due to the constraint (k is set by R0)
number of clusters in each round: (78, 80, 81)
--------------------------------------------------------------------------- Exception Traceback (most recent call last) <ipython-input-55-152602d503a2> in <module> ----> 1 cluster_labels = match.match_cores_across_rounds(res) 2 res = res.assign(cluster = cluster_labels) /home/exacloud/lustre1/NGSdev/evansna/cyclicIF/cyclicIF_registration/workflow/libs/match.py in match_cores_across_rounds(info) 64 # https://github.com/joshlk/k-means-constrained 65 clus = KMeansConstrained(n_clusters=num_R0_components, init=seeds, size_max=num_of_rounds, n_init=1, tol=1e-8, max_iter=1000) ---> 66 _ = clus.fit( X ) 67 68 return clus.labels_ + 1 /home/exacloud/lustre1/NGSdev/evansna/external/anaconda3/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py in fit(self, X, y) 629 630 self.cluster_centers_, self.labels_, self.inertia_, self.n_iter_ = \ --> 631 k_means_constrained( 632 X, n_clusters=self.n_clusters, 633 size_min=self.size_min, size_max=self.size_max, /home/exacloud/lustre1/NGSdev/evansna/external/anaconda3/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py in k_means_constrained(X, n_clusters, size_min, size_max, init, n_init, max_iter, verbose, tol, random_state, copy_x, n_jobs, return_n_iter) 173 for it in range(n_init): 174 # run a k-means once --> 175 labels, inertia, centers, n_iter_ = kmeans_constrained_single( 176 X, n_clusters, 177 size_min=size_min, size_max=size_max, /home/exacloud/lustre1/NGSdev/evansna/external/anaconda3/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py in kmeans_constrained_single(X, n_clusters, size_min, size_max, max_iter, init, verbose, x_squared_norms, random_state, tol) 325 # labels assignment is also called the E-step of EM 326 labels, inertia = \ --> 327 _labels_constrained(X, centers, size_min, size_max, distances=distances) 328 329 # computation of the means is also called the M-step of EM /home/exacloud/lustre1/NGSdev/evansna/external/anaconda3/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py in _labels_constrained(X, centers, size_min, size_max, distances) 396 397 edges, costs, capacities, supplies, n_C, n_X = minimum_cost_flow_problem_graph(X, C, D, size_min, size_max) --> 398 labels = solve_min_cost_flow_graph(edges, costs, capacities, supplies, n_C, n_X) 399 400 # cython k-means M step code assumes int32 inputs /home/exacloud/lustre1/NGSdev/evansna/external/anaconda3/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py in solve_min_cost_flow_graph(edges, costs, capacities, supplies, n_C, n_X) 483 # Find the minimum cost flow between node 0 and node 4. 484 if min_cost_flow.Solve() != min_cost_flow.OPTIMAL: --> 485 raise Exception('There was an issue with the min cost flow input.') 486 487 # Assignment Exception: There was an issue with the min cost flow input.
Check for exception and fall back on normal k-means?
Switched to using DBSCAN - this allows outliers and works quite a bit better.
I suspect it's because there are more observations than allowed due to the constraint (k is set by R0)
number of clusters in each round: (78, 80, 81)