simonwm / tacco

TACCO: Transfer of Annotations to Cells and their COmbinations
BSD 3-Clause "New" or "Revised" License
42 stars 1 forks source link

type_prior contains na! #15

Closed pakiessling closed 6 months ago

pakiessling commented 6 months ago

Hi,

I am trying to annotate a dataset like this:

def tacco_annotation(adata, ref, ct_column="cell_type", **kwargs):
    assert (
        adata.X.max().is_integer() and ref.X.max().is_integer()
    ), "Data must be raw counts"
    adata = tc.tl.annotate(
        adata,
        ref,
        annotation_key=ct_column,
        **kwargs,
        result_key="tacco",
    )
    adata = tc.utils.get_maximum_annotation(adata, "tacco", "tacco")
    adata.obs["tacco_score"] = adata.obsm["tacco"].max(axis=1)
    return adata

adata = tacco_annotation(
    adata,
    ref,
    ct_column="cell_subtype",
)

I get the following error:

Starting preprocessing
Annotation profiles were not found in `reference.varm["cell_subtype"]`. Constructing reference profiles with `tacco.preprocessing.construct_reference_profiles` and default arguments...
Finished preprocessing in 0.5 seconds.
Starting annotation of data with shape (67344, 483) and a reference of shape (2736, 483) using the following wrapped method:
+- platform normalization: platform_iterations=0, gene_keys=cell_subtype, normalize_to=adata
   +- multi center: multi_center=None multi_center_amplitudes=True
      +- bisection boost: bisections=4, bisection_divisor=3
         +- core: method=OT annotation_prior=None
mean,std( rescaling(gene) )  51.64145217636585 121.51160341211514
bisection run on 1
[241]( tacco/utils/_utils.py:241) def _run_OT(type_cell_dist, type_prior=None, cell_prior=None, epsilon=5e-3, lamb=None, inplace=False):
    [242]( tacco/utils/_utils.py:242) 
    [243]( tacco/utils/_utils.py:243)     # check sanity of arguments
    [244]( tacco/utils/_utils.py:244)     if type_prior is not None and type_prior.isna().any():
--> [245]( tacco/utils/_utils.py:245)         raise Exception('type_prior contains na!')
    [246]( tacco/utils/_utils.py:246)     if type_prior is not None and (type_prior<0).any():
    [247]( tacco/utils/_utils.py:247)         raise Exception('type_prior contains negative values!')

Exception: type_prior contains na!

I previoulsy annotated other datasets succesfully. I also used the novosparc method for this reference and dataset and it completed with a warning:

Trying with epsilon: 5.00e-04
ot/bregman/_sinkhorn.py:498: RuntimeWarning: divide by zero encountered in divide
  v = b / KtransposeU
ot/bregman/_sinkhorn.py:498: RuntimeWarning: overflow encountered in divide
  v = b / KtransposeU
ot/bregman/_sinkhorn.py:506: UserWarning: Warning: numerical errors at iteration 0
  warnings.warn('Warning: numerical errors at iteration %d' % ii)

I made sure that both dataset have positive integers and no NaNs. Any ideas what is going wrong?

JWatter commented 6 months ago

Hi again! Thanks for posting.

I managed to trigger this error with nans and infs in .X or .obs:

# import packages
import tacco as tc
import anndata as ad
import numpy as np
import pandas as pd

# setup test case
X = np.diag([1.0]*5)
#X[0,0] = np.nan # setting one element in .X to nan gives that error
#X[0,0] = np.inf # setting one element in .X to inf gives that error
adata = ad.AnnData(X)
adata.obs['ct'] = pd.Series(['A','A','B','B','C'],index=adata.obs.index).astype(pd.CategoricalDtype(['A','B','C']))
#adata.obs['ct'] = adata.obs['ct'].astype(pd.CategoricalDtype(['A','B'])) # provoking an nan in the celltype annotation also gives that error

# trigger the error
tc.tl.annotate(adata,adata,annotation_key='ct',result_key="tacco")

Can you check whether one of them happens in your input?

If these issues are not present in your input, could you try creating a minimal non-working example, like the one above? That should help you find issues with the input and me investigating possible bugs in the code.

pakiessling commented 6 months ago

Huh, it seems to work now. Not sure what the problem was. Probably an NA in annotation or something. Sorry about that.