openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
287 stars 76 forks source link

[Batch integration] Isolated labels broken for pancreas data #757

Closed scottgigante-immunai closed 1 year ago

scottgigante-immunai commented 1 year ago

The only isolated label is t_cell which only has seven cells. This doesn't seem like a good metric.

>>> import openproblems
>>> adata = openproblems.tasks.batch_integration_embed.datasets.pancreas_batch()
>>> from scib.metrics.isolated_labels import *
>>> isolated_labels = get_isolated_labels(
...         adata, label_key="labels", batch_key="batch", iso_threshold=None, verbose=True
...     )
isolated labels: no more than 4 batches per label
>>> isolated_labels
['t_cell']
>>> adata.obs['labels'].value_counts()
alpha                 5493
beta                  4169
ductal                2142
acinar                1669
delta                 1055
gamma                  699
activated_stellate     464
endothelial            313
quiescent_stellate     193
macrophage              79
mast                    42
epsilon                 32
schwann                 25
t_cell                   7
Name: labels, dtype: int64