scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.9k stars 597 forks source link

sc.tl.rank_genes_groups: reference argument is ignored #1485

Open BrianLohman opened 3 years ago

BrianLohman commented 3 years ago

Hello,

I am having problems with the sc.tl.rank_genes_groups function. Specifically, I specify a reference level with reference = argument but it is ignored. The table that this function produces in the .uns object indicates the reference as rest (the default) when I have indicated otherwise.

print(set(noncycling_adult.obs.class_1))
#{'krt', 'dendritic', 'eccrine', 'T-cell', 'mel'}

sc.tl.rank_genes_groups(noncycling_adult, groupby = 'class_1', groups = ['eccrine', 'krt', 'T-cell', 'dendritic'], reference = 'mel', method = 'wilcoxon')

print(full_adata.uns['rank_genes_groups'])

"""{'params': {'groupby': 'class_1', 'reference': 'rest', 'method': 'wilcoxon', 'use_raw': True, 'corr_method': 'benjamini-hochberg'}, 'scores': rec.array([(8.494621 ,), (8.326364 ,), (8.24139  ,), (7.382108 ,),
           (7.340947 ,), (7.25889  ,), (7.2148457,), (7.0626616,),
           (6.991276 ,), (6.952865 ,)],
          dtype=[('T-cell', '<f4')]), 'names': rec.array([('IL32',), ('CD52',), ('CORO1A',), ('CD3D',), ('IL2RG',),
           ('PTPRCAP',), ('RAC2',), ('CD2',), ('LTB',), ('S100A4',)],
          dtype=[('T-cell', '<U50')]), 'logfoldchanges': rec.array([(10.175177 ,), (12.354224 ,), (11.05518  ,), (14.337216 ,),
           (11.3317585,), ( 9.758805 ,), ( 8.825092 ,), (14.170704 ,),
           (10.144425 ,), ( 5.6517367,)],
          dtype=[('T-cell', '<f4')]), 'pvals': rec.array([(1.98579427e-17,), (8.33632215e-17,), (1.70221006e-16,),
           (1.55802204e-13,), (2.12087430e-13,), (3.90279912e-13,),
           (5.39952731e-13,), (1.63343167e-12,), (2.72397796e-12,),
           (3.57940624e-12,)],
          dtype=[('T-cell', '<f8')]), 'pvals_adj': rec.array([(4.86400449e-13,), (1.02094937e-12,), (1.38979777e-12,),
           (9.54054799e-10,), (1.03897390e-09,), (1.59325269e-09,),
           (1.88937174e-09,), (5.00115941e-09,), (7.41345735e-09,),
           (8.76739764e-09,)],
          dtype=[('T-cell', '<f8')])}
"""

Thanks for your help

Versions

scanpy==1.4.4.post1 anndata==0.6.22.post1 umap==0.3.10 numpy==1.17.4 scipy==1.3.2 pandas==1.1.3 scikit-learn==0.22 statsmodels==0.12.0 python-igraph==0.7.1 louvain==0.6.1
ivirshup commented 3 years ago

This issue has been mentioned on Scanpy. There might be relevant details there:

https://scanpy.discourse.group/t/sc-tl-rank-genes-groups-specify-groups-and-implementation-for-multiple-tests/328/3

BrianLohman commented 3 years ago

Hi @ivirshup,

Yes, both of them are me. I incorporated feedback from the help forum which suggested that this function has a bug, hence, the bug report.

Koncopd commented 3 years ago

@BrianLohman I can't reproduce this. Could you please update scanpy and check?