yrsong001 commented 1 year ago

Hi! I have tried to use the method rank_aggregate() and cellphonedb() in Liana. Both have the below issue. I checked my data as shown below. Wonder how this key error always happens. Thank you!

Run rank_aggregate

li.mt.rank_aggregate(adata, groupby='celltype.lv1', expr_prop=0.1, verbose=True, key_added='cpdb_res')

run cellphonedb

cellphonedb(adata, groupby='celltype.lv1', expr_prop=0.1, use_raw=False, return_all_lrs=True, verbose=True)

File ~/anaconda3/envs/scenicplus/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:57, in assert_covered(subset, superset, subset_name, superset_name, prop_missing_allowed, verbose) 50 if prop_missing > prop_missing_allowed: 51 msg = ( 52 f"Please check if appropriate organism/ID type was provided! " 53 f"Allowed proportion ({prop_missing_allowed}) of missing " 54 f"{subset_name} elements exceeded ({prop_missing:.2f}). " 55 f"Too few features from the resource were found in the data." 56 ) ---> 57 raise ValueError(msg + f" [{x_missing}] missing from {superset_name}") 59 if verbose & (prop_missing > 0): 60 print(f"{prop_missing:.2f} of entities in the resource are missing from the data.")

ValueError: Please check if appropriate organism/ID type was provided! Allowed proportion (0.98) of missing resource elements exceeded (1.00). Too few features from the resource were found in the data. [A1BG, A2M, AANAT, ABCA1, ACE, ACKR1, ACKR2, ACKR3, ACKR4, ACTR2, ACVR1, ACVR1B, ACVR1C, ACVR2A, ACVR2B, ACVRL1, ADA, ADAM10, ADAM11, ADAM12, ADAM15, ADAM17, ADAM2, ADAM22, ADAM23, ADAM28, ADAM29, ADAM7, ADAM9, ADAMTS3, ADCY1, ADCY7, ADCY8, ADCY9 ....] missing from var_names

print(adata) AnnData object with n_obs × n_vars = 24062 × 2000 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'time_point', 'RNA_snn_res.0.5', 'seurat_clusters', 'celltype.lv1', 'celltype.lv2', 'time_month', 'scDblFinder.class', 'scDblFinder.score', 'celltype.lv3' var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable' obsm: 'X_pca', 'X_umap' varm: 'PCs'

adata.X, adata.raw.X (array([[-0.32665763, -0.34369911, -0.19467216, ..., -0.48180683, 2.08207319, -0.05807534], [ 0.73692432, -0.34369911, -0.19467216, ..., 2.23740636, 1.67791563, -0.05807534], [-0.32665763, -0.34369911, -0.19467216, ..., -0.48180683, -0.45570303, -0.05807534], ..., [-0.32665763, -0.34369911, -0.19467216, ..., 2.89027397, -0.45570303, -0.05807534], [ 5.5336266 , -0.34369911, -0.19467216, ..., -0.48180683, -0.45570303, -0.05807534], [-0.32665763, -0.34369911, -0.19467216, ..., -0.48180683, 2.08376311, -0.05807534]]), <24062x32285 sparse matrix of type '<class 'numpy.float64'>' with 85893851 stored elements in Compressed Sparse Row format>)

dbdimitrov commented 1 year ago

Hi,

Please check your variable names. The default resource in LIANA is gene symbols. You can see by the error message that no variable matches the resource.

Let me know if this doesn't work :)

yrsong001 commented 1 year ago

Hi,

Please check your variable names. The default resource in LIANA is gene symbols. You can see by the error message that no variable matches the resource.

Let me know if this doesn't work :)

Hi! Thanks for replying. I have checked the name, seems they are all already the gene symbols. This is the mouse genes, wonder if this affects that? Thank you.

print(adata.var_names[:50]) # This will print the first 10 gene identifiers Index(['Xkr4', 'Sox17', 'Rgs20', 'St18', 'Sntg1', 'Cpa6', 'Sulf1', 'Kcnb2', 'Rdh10', 'Gdap1', 'Pi15', 'Crispld1', 'Il17a', 'Khdc1a', 'Kcnq5', 'Ptpn18', 'Neurl3', 'Gm5099', 'Rnf149', 'Il1r2', 'Il1r1', 'Il18r1', 'Il18rap', 'Fhl2', 'Ecrg4', 'Col3a1', 'Col5a2', 'Slc40a1', 'Tmeff2', 'Cavin2', 'Stat4', 'Stk17b', 'Aox3', 'Cd28', 'Ctla4', 'Icos', 'Nrp2', 'Zdbf2', 'Adam23', 'Myl1', 'Erbb4', 'Ikzf2', 'Fn1', 'Mreg', 'Igfbp5', 'Cxcr2', 'Slc11a1', 'Des', 'Pax3', 'Gm29536'], dtype='object')

dbdimitrov commented 1 year ago

Hi,

Then if you select the MouseConsensus resource it should work (via resource_name). 🙂

dbdimitrov commented 1 year ago

I will assume this worked out. Feel free to open new issues.

saezlab / liana-py

KeyError: missing elements #51

Run rank_aggregate

run cellphonedb