simonwm / tacco

TACCO: Transfer of Annotations to Cells and their COmbinations
BSD 3-Clause "New" or "Revised" License
44 stars 1 forks source link

Met problems when using tc.tl.annotate() #3

Closed Smilenone closed 1 year ago

Smilenone commented 1 year ago

When I use tc.tl.annotate(), where the ref and space data was read by scanpy. I met the following error:

InvalidIndexError Traceback (most recent call last) Input In [7], in <cell line: 1>() ----> 1 tc.tl.annotate(adata_spatial, reference,'celltype1',result_key='celltype1',)

File D:\Basic Tools\Anaconda\lib\site-packages\tacco\tools_annotate.py:734, in annotate(adata, reference, annotation_key, result_key, counts_location, method, bisections, bisection_divisor, platform_iterations, normalize_to, annotation_prior, multi_center, multi_center_amplitudes, reconstruction_key, max_annotation, min_counts_per_gene, min_counts_per_cell, min_cells_per_gene, min_genes_per_cell, remove_constant_genes, remove_zero_cells, min_log2foldchange, min_expression, remove_mito, n_hvg, skip_checks, assume_valid_counts, return_reference, gene_keys, verbose, **kw_args) 732 except ValueError as e: # as e syntax added in ~python2.5 733 raise ValueError(f'{str(e)}\nYou can deactivate checking for invalid counts by specifying assume_valid_counts=True.') --> 734 tdata,reference = preprocessing.filter(adata=(tdata,reference), min_counts_per_cell=min_counts_per_cell, min_counts_per_gene=min_counts_per_gene, min_cells_per_gene=min_cells_per_gene, min_genes_per_cell=min_genes_per_cell, remove_constant_genes=remove_constant_genes, remove_zero_cells=remove_zero_cells, assume_valid_counts=True) # ensure consistent gene selection 735 if verbose > 0: 736 print(f'Finished preprocessing in {np.round(time.time() - start, 2)} seconds.')

File D:\Basic Tools\Anaconda\lib\site-packages\tacco\preprocessing_qc.py:163, in filter(adata, min_counts_per_gene, min_counts_per_cell, min_cells_per_gene, min_genes_per_cell, remove_constant_genes, remove_zero_cells, assume_valid_counts, return_view) 161 for i in range(len(adatas)): 162 if len(adatas[i].var.index) != len(good_genes): # filter happened --> 163 adatas[i] = adatas[i][:,good_genes] 164 changed = True 165 elif (adatas[i].var.index != good_genes).any(): # reordering happened: no side effects on cell filtering

File D:\Basic Tools\Anaconda\lib\site-packages\anndata_core\anndata.py:1113, in AnnData.getitem(self, index) 1111 def getitem(self, index: Index) -> "AnnData": 1112 """Returns a sliced view of the object.""" -> 1113 oidx, vidx = self._normalize_indices(index) 1114 return AnnData(self, oidx=oidx, vidx=vidx, asview=True)

File D:\Basic Tools\Anaconda\lib\site-packages\anndata_core\anndata.py:1094, in AnnData._normalize_indices(self, index) 1093 def _normalize_indices(self, index: Optional[Index]) -> Tuple[slice, slice]: -> 1094 return _normalize_indices(index, self.obs_names, self.var_names)

File D:\Basic Tools\Anaconda\lib\site-packages\anndata_core\index.py:36, in _normalize_indices(index, names0, names1) 34 ax0, ax1 = unpack_index(index) 35 ax0 = _normalize_index(ax0, names0) ---> 36 ax1 = _normalize_index(ax1, names1) 37 return ax0, ax1

File D:\Basic Tools\Anaconda\lib\site-packages\anndata_core\index.py:98, in _normalize_index(indexer, index) 96 return positions # np.ndarray[int] 97 else: # indexer should be string array ---> 98 positions = index.get_indexer(indexer) 99 if np.any(positions < 0): 100 not_found = indexer[positions < 0]

File D:\Basic Tools\Anaconda\lib\site-packages\pandas\core\indexes\base.py:3905, in Index.get_indexer(self, target, method, limit, tolerance) 3902 self._check_indexing_method(method, limit, tolerance) 3904 if not self._index_as_unique: -> 3905 raise InvalidIndexError(self._requires_unique_msg) 3907 if len(target) == 0: 3908 return np.array([], dtype=np.intp)

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Can you help?

simonwm commented 1 year ago

Hi @Smilenone and sorry for the delayed response.

The error InvalidIndexError: Reindexing only valid with uniquely valued Index objects usually comes from AnnData objects which either have duplicated gene names in the adata.var.index slot (or less frequently duplicated cell names in the adata.obs.index slot). In that case a quick fix would be to use var_names_make_unique (or obs_names_make_unique).

The error should also pop up using certain features of scanpy, so it is not Tacco specific.

I hope this helps!