lmcinnes / umap

Uniform Manifold Approximation and Projection
BSD 3-Clause "New" or "Revised" License
7.39k stars 803 forks source link

Problem in supervised multilabel reduction, "truth value of an array is ambiguous" #1132

Closed jb-chaudron closed 3 months ago

jb-chaudron commented 3 months ago

I'm performing a multilabel dataset reduction, I've been using the "target_metric" option to perform the supervised multilabel reduction.

I have on one hand a data frame and on the other hand a np array.

emb = UMAP(metric="hamming").fit(train_samples, target )

The shape of the data frame is (125789, 122) and the target vector : (125789, 6) I've

But the error persist, however this work fine if I drop the target for the fit. Below is the full error

src="https://github.com/lmcinnes/umap/assets/58364804/5d699696-fba7-4ddb-a197-4606d4104f0a">
{
    "name": "ValueError",
    "message": "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()",
    "stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [46], in <cell line: 2>()
      1 umap_y_train_samples = y_train[X_train_preprocessing.index.isin(train_samples.index)]
----> 2 emb = UMAP(metric=\"euclidean\").fit(train_samples.fillna(0).to_numpy(),umap_y_train_samples.astype(int))

File ~/mambaforge/envs/umap-env/lib/python3.10/site-packages/umap/umap_.py:2682, in UMAP.fit(self, X, y, force_all_finite)
   2680     else:
   2681         far_dist = 1.0e12
-> 2682     self.graph_ = discrete_metric_simplicial_set_intersection(
   2683         self.graph_, y_, far_dist=far_dist
   2684     )
   2685 elif self.target_metric in dist.DISCRETE_METRICS:
   2686     if self.target_weight < 1.0:

File ~/mambaforge/envs/umap-env/lib/python3.10/site-packages/umap/umap_.py:841, in discrete_metric_simplicial_set_intersection(simplicial_set, discrete_space, unknown_dist, far_dist, metric, metric_kws, metric_scale)
    831     fast_metric_intersection(
    832         simplicial_set.row,
    833         simplicial_set.col,
   (...)
    838         metric_scale,
    839     )
    840 else:
--> 841     fast_intersection(
    842         simplicial_set.row,
    843         simplicial_set.col,
    844         simplicial_set.data,
    845         discrete_space,
    846         unknown_dist,
    847         far_dist,
    848     )
    850 simplicial_set.eliminate_zeros()
    852 return reset_local_connectivity(simplicial_set)

File ~/mambaforge/envs/umap-env/lib/python3.10/site-packages/numba/np/arrayobj.py:5540, in impl()
   5537 else:
   5538     msg = (\"The truth value of an array with more than one element \"
   5539            \"is ambiguous. Use a.any() or a.all()\")
-> 5540     raise ValueError(msg)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"
}