slowkow / harmonypy

🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
https://portals.broadinstitute.org/harmony/
GNU General Public License v3.0
198 stars 22 forks source link

run harmonypy true_divide error #15

Closed gmoore5 closed 2 years ago

gmoore5 commented 2 years ago

When running harmonypy I got an error I have not seen before. Not sure what is wrong or how to fix it.

import harmonypy
sce.pp.harmony_integrate(adata, 'SINGLECELL_TYPE', max_iter_harmony = 10)
2021-11-09 14:10:44,301 - harmonypy - INFO - Iteration 1 of 10
/PHShome/hm604/.conda/envs/HM_Py3.9_try2/lib/python3.9/site-packages/harmonypy/harmony.py:295: RuntimeWarning: invalid value encountered in true_divide
  self.R[:,b] = self.R[:,b] / np.linalg.norm(self.R[:,b], ord=1, axis=0)
2021-11-09 14:14:11,826 - harmonypy - INFO - Iteration 2 of 10

It keeps running until it's done.

Then I try this:

sc.pp.neighbors(adata,  use_rep='X_pca_harmony', n_neighbors=10, n_pcs=50)
computing neighbors
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_54497/579152319.py in <module>
----> 1 sc.pp.neighbors(adata, 
      2                 use_rep='X_pca_harmony',
      3                 n_neighbors=10, n_pcs=50)
      4 sc.tl.umap(adata)
      5 sc.pl.umap(adata,color=['SINGLECELL_TYPE'])

~/.conda/envs/HM_Py3.9_try2/lib/python3.9/site-packages/scanpy/neighbors/__init__.py in neighbors(adata, n_neighbors, n_pcs, use_rep, knn, random_state, method, metric, metric_kwds, key_added, copy)
    137         adata._init_as_actual(adata.copy())
    138     neighbors = Neighbors(adata)
--> 139     neighbors.compute_neighbors(
    140         n_neighbors=n_neighbors,
    141         knn=knn,

~/.conda/envs/HM_Py3.9_try2/lib/python3.9/site-packages/scanpy/neighbors/__init__.py in compute_neighbors(self, n_neighbors, knn, n_pcs, use_rep, method, random_state, write_knn_indices, metric, metric_kwds)
    789                 X = pairwise_distances(X, metric=metric, **metric_kwds)
    790                 metric = 'precomputed'
--> 791             knn_indices, knn_distances, forest = compute_neighbors_umap(
    792                 X, n_neighbors, random_state, metric=metric, metric_kwds=metric_kwds
    793             )

~/.conda/envs/HM_Py3.9_try2/lib/python3.9/site-packages/scanpy/neighbors/__init__.py in compute_neighbors_umap(X, n_neighbors, random_state, metric, metric_kwds, angular, verbose)
    303     random_state = check_random_state(random_state)
    304 
--> 305     knn_indices, knn_dists, forest = nearest_neighbors(
    306         X,
    307         n_neighbors,

~/.conda/envs/HM_Py3.9_try2/lib/python3.9/site-packages/umap/umap_.py in nearest_neighbors(X, n_neighbors, metric, metric_kwds, angular, random_state, low_memory, use_pynndescent, n_jobs, verbose)
    326         n_iters = max(5, int(round(np.log2(X.shape[0]))))
    327 
--> 328         knn_search_index = NNDescent(
    329             X,
    330             n_neighbors=n_neighbors,

~/.conda/envs/HM_Py3.9_try2/lib/python3.9/site-packages/pynndescent/pynndescent_.py in __init__(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, random_state, low_memory, max_candidates, n_iters, delta, n_jobs, compressed, verbose)
    684         self.verbose = verbose
    685 
--> 686         data = check_array(data, dtype=np.float32, accept_sparse="csr", order="C")
    687         self._raw_data = data
    688 

~/.conda/envs/HM_Py3.9_try2/lib/python3.9/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    790 
    791         if force_all_finite:
--> 792             _assert_all_finite(array, allow_nan=force_all_finite == "allow-nan")
    793 
    794     if ensure_min_samples > 0:

~/.conda/envs/HM_Py3.9_try2/lib/python3.9/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
    112         ):
    113             type_err = "infinity" if allow_nan else "NaN, infinity"
--> 114             raise ValueError(
    115                 msg_err.format(
    116                     type_err, msg_dtype if msg_dtype is not None else X.dtype

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

I am using:

python == 3.9
harmonypy  == 0.0.5
scanpy == 1.8.2
anndata == 0.7.6
umap == 0.5.2
numpy == 1.20.3
scipy == 1.7.1
pandas == 1.3.4
scikit-learn == 1.0.1
statsmodels == 0.13.0
pynndescent == 0.5.5
gmoore5 commented 2 years ago

Okay so it seems to be an anndata issue since it only happens with this .h5ad file and not others, but I don't know what is different about this dataset versus others? The data that works is literally a subset of this larger set.

slowkow commented 2 years ago

The data that works is literally a subset of this larger set.

So, it seems likely that the larger dataset may have invalid values, and we can avoid the invalid values by taking a subset of the data.

Did you check to see what is inside the adata expression matrix that you're passing to harmony?

Did you check the values of SINGLECELL_TYPE? If there are unused categories, that can be an issue.

If you can provide us with data and code to run on our machines, then we can try to replicate your error and then find a fix.

gmoore5 commented 2 years ago

The data that works is literally a subset of this larger set.

So, it seems likely that the larger dataset may have invalid values, and we can avoid the invalid values by taking a subset of the data.

Did you check to see what is inside the adata expression matrix that you're passing to harmony?

Did you check the values of SINGLECELL_TYPE? If there are unused categories, that can be an issue.

If you can provide us with data and code to run on our machines, then we can try to replicate your error and then find a fix.

Hi @slowkow thank you for your response. It was an issue on my end. I didn't realize I was using an older file that did contain missing values. After getting rid of the missing values, I am not getting that error anymore. Sorry about the confusion and I appreciate your willingness to help!