Originally posted by **benayedi** May 23, 2024
Hi!
I'm encountering an issue when trying to run the ms.perturbation_signature method on my AnnData object. Specifically, I'm using the following command:
`ms.perturbation_signature(adata, "gene.compact", "0", "replicate")`
gene.compact represents the actual gene.
replicate is the preprocessed column from guide.compact where it represents the four guides, and 0 is for unassigned guide.
Here is the value count for the replicate column:
`print(adata.obs["replicate"].value_counts())`
```
replicate
0 15903
3 5724
1 5606
4 4875
2 4163
Name: count, dtype: int64
```
The command results in the following error:
```
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pynndescent/pynndescent_.py:703: RuntimeWarning: divide by zero encountered in log2
n_iters = max(5, int(round(np.log2(data.shape[0]))))
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
Cell In[15], line 1
----> 1 ms.perturbation_signature(adata, "gene.compact", "unassigned", "replicate")
File /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pertpy/tools/_mixscape.py:110, in Mixscape.perturbation_signature(self, adata, pert_key, control, split_by, n_neighbors, use_rep, n_pcs, batch_size, copy, **kwargs)
107 from pynndescent import NNDescent
109 eps = kwargs.pop("epsilon", 0.1)
--> 110 nn_index = NNDescent(R_control, **kwargs)
111 indices, _ = nn_index.query(R_split, k=n_neighbors, epsilon=eps)
113 X_control = np.expm1(adata.X[control_mask_split])
File /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pynndescent/pynndescent_.py:703, in NNDescent.__init__(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, max_rptree_depth, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose)
701 n_trees = min(32, n_trees) # Only so many trees are useful
702 if n_iters is None:
--> 703 n_iters = max(5, int(round(np.log2(data.shape[0]))))
705 self.n_trees = n_trees
706 self.n_trees after update = max(1, int(np.round(self.n_trees / 3)))
OverflowError: cannot convert float infinity to integer
```
I have ensured that the data does not include any NaN values. Interestingly, the following command works without any issues:
`ms.perturbation_signature(adata, "gene.compact", "unassigned", "batch")`
but this one causes the same error above:
`ms.perturbation_signature(adata, "gene.compact", "unassigned", "guide.compact")`
Here are the value counts for the batch column:
```
batch
0 3495
1 2601
2 2529
4 2516
3 2474
7 2434
14 2432
13 2314
8 2306
5 2270
10 2239
9 2216
11 2215
12 2147
6 2083
Name: count, dtype: int64
```
Does anyone know how to resolve this issue with the replicate column? Any insights would be greatly appreciated.
Thank you!
Discussed in https://github.com/theislab/pertpy/discussions/604