The shape of the data frame is (125789, 122) and the target vector : (125789, 6)
I've
Checked for nans
Checked with "hamming", "jaccard" and "euclidean" functions
Transformed the DF into a numpy array
Changed the target to float, int, and bool
But the error persist, however this work fine if I drop the target for the fit.
Below is the full error
src="https://github.com/lmcinnes/umap/assets/58364804/5d699696-fba7-4ddb-a197-4606d4104f0a">
{
"name": "ValueError",
"message": "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()",
"stack": "---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [46], in <cell line: 2>()
1 umap_y_train_samples = y_train[X_train_preprocessing.index.isin(train_samples.index)]
----> 2 emb = UMAP(metric=\"euclidean\").fit(train_samples.fillna(0).to_numpy(),umap_y_train_samples.astype(int))
File ~/mambaforge/envs/umap-env/lib/python3.10/site-packages/umap/umap_.py:2682, in UMAP.fit(self, X, y, force_all_finite)
2680 else:
2681 far_dist = 1.0e12
-> 2682 self.graph_ = discrete_metric_simplicial_set_intersection(
2683 self.graph_, y_, far_dist=far_dist
2684 )
2685 elif self.target_metric in dist.DISCRETE_METRICS:
2686 if self.target_weight < 1.0:
File ~/mambaforge/envs/umap-env/lib/python3.10/site-packages/umap/umap_.py:841, in discrete_metric_simplicial_set_intersection(simplicial_set, discrete_space, unknown_dist, far_dist, metric, metric_kws, metric_scale)
831 fast_metric_intersection(
832 simplicial_set.row,
833 simplicial_set.col,
(...)
838 metric_scale,
839 )
840 else:
--> 841 fast_intersection(
842 simplicial_set.row,
843 simplicial_set.col,
844 simplicial_set.data,
845 discrete_space,
846 unknown_dist,
847 far_dist,
848 )
850 simplicial_set.eliminate_zeros()
852 return reset_local_connectivity(simplicial_set)
File ~/mambaforge/envs/umap-env/lib/python3.10/site-packages/numba/np/arrayobj.py:5540, in impl()
5537 else:
5538 msg = (\"The truth value of an array with more than one element \"
5539 \"is ambiguous. Use a.any() or a.all()\")
-> 5540 raise ValueError(msg)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"
}
I'm performing a multilabel dataset reduction, I've been using the "target_metric" option to perform the supervised multilabel reduction.
I have on one hand a data frame and on the other hand a np array.
The shape of the data frame is (125789, 122) and the target vector : (125789, 6) I've
But the error persist, however this work fine if I drop the target for the fit. Below is the full error