scverse / pertpy

Perturbation Analysis in the scverse ecosystem.
https://pertpy.readthedocs.io/en/latest/
MIT License
92 stars 19 forks source link

OverflowError: cannot convert float infinity to integer with ms.perturbation_signature Method in Mixscape #605

Closed Zethson closed 4 weeks ago

Zethson commented 1 month ago

Discussed in https://github.com/theislab/pertpy/discussions/604

Originally posted by **benayedi** May 23, 2024 Hi! I'm encountering an issue when trying to run the ms.perturbation_signature method on my AnnData object. Specifically, I'm using the following command: `ms.perturbation_signature(adata, "gene.compact", "0", "replicate")` gene.compact represents the actual gene. replicate is the preprocessed column from guide.compact where it represents the four guides, and 0 is for unassigned guide. Here is the value count for the replicate column: `print(adata.obs["replicate"].value_counts())` ``` replicate 0 15903 3 5724 1 5606 4 4875 2 4163 Name: count, dtype: int64 ``` The command results in the following error: ``` /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pynndescent/pynndescent_.py:703: RuntimeWarning: divide by zero encountered in log2 n_iters = max(5, int(round(np.log2(data.shape[0])))) --------------------------------------------------------------------------- OverflowError Traceback (most recent call last) Cell In[15], line 1 ----> 1 ms.perturbation_signature(adata, "gene.compact", "unassigned", "replicate") File /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pertpy/tools/_mixscape.py:110, in Mixscape.perturbation_signature(self, adata, pert_key, control, split_by, n_neighbors, use_rep, n_pcs, batch_size, copy, **kwargs) 107 from pynndescent import NNDescent 109 eps = kwargs.pop("epsilon", 0.1) --> 110 nn_index = NNDescent(R_control, **kwargs) 111 indices, _ = nn_index.query(R_split, k=n_neighbors, epsilon=eps) 113 X_control = np.expm1(adata.X[control_mask_split]) File /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pynndescent/pynndescent_.py:703, in NNDescent.__init__(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, max_rptree_depth, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose) 701 n_trees = min(32, n_trees) # Only so many trees are useful 702 if n_iters is None: --> 703 n_iters = max(5, int(round(np.log2(data.shape[0])))) 705 self.n_trees = n_trees 706 self.n_trees after update = max(1, int(np.round(self.n_trees / 3))) OverflowError: cannot convert float infinity to integer ``` I have ensured that the data does not include any NaN values. Interestingly, the following command works without any issues: `ms.perturbation_signature(adata, "gene.compact", "unassigned", "batch")` but this one causes the same error above: `ms.perturbation_signature(adata, "gene.compact", "unassigned", "guide.compact")` Here are the value counts for the batch column: ``` batch 0 3495 1 2601 2 2529 4 2516 3 2474 7 2434 14 2432 13 2314 8 2306 5 2270 10 2239 9 2216 11 2215 12 2147 6 2083 Name: count, dtype: int64 ``` Does anyone know how to resolve this issue with the replicate column? Any insights would be greatly appreciated. Thank you!
Zethson commented 1 month ago

@benayedi you can subscribe to this issue. I closed the discussions

Zethson commented 1 month ago

@benayedi thank you very much for the detailed issue report. Would it be possible for you to share your object, please?