openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
314 stars 78 forks source link

[batch integration] silhouette_batch not maximised by baselines #817

Closed scottgigante-immunai closed 1 year ago

scottgigante-immunai commented 1 year ago
image

cc @danielStrobl @LuckyMD

Originally posted by @scottgigante-immunai in https://github.com/openproblems-bio/openproblems/issues/685#issuecomment-1419639829

LuckyMD commented 1 year ago

How can something get a score that is better than a silhouette score of 0 which is achieved by the celltype_random_embedding?

LuckyMD commented 1 year ago

Okay, this makes no sense... I checked scib code and it correctly does mean_i (1-abs(silhouette_i)) for all cells, i, per cell type and then takes the mean across those. you can have values higher than 1 by subtracting an absolute value from 1...

scottgigante-immunai commented 1 year ago

This is after scaling -- the best raw score is 0.95, but the celltype_random_embedding is the best performing baseline at ~0.7

LuckyMD commented 1 year ago

Ah, okay. That makes more sense. Tbh, this should also be maximized by cell type one-hot encoding (without jitter).

scottgigante-immunai commented 1 year ago

So is this a bug in scIB? The jitter is very small, it shouldn't make the baseline perform worse than scanorama

LuckyMD commented 1 year ago

No, I don't think this is a bug in scib. This will have to do with baselines and how they work with potentially unbalanced batches.

LuckyMD commented 1 year ago

Is celltype random embedding the one-hot-encoding?

scottgigante-immunai commented 1 year ago

Ah wait that's a one-hot encoding by celltype. This would need to be a one-hot encoding by batch?

LuckyMD commented 1 year ago

no, i want batches to be fully mixed by celltype. one-hot-encoding by batch would be the worst case score for this metric.

scottgigante-immunai commented 1 year ago

Got it. So celltype_random_embedding should score perfectly here

LuckyMD commented 1 year ago

yes... unless there is some effect of batch balance that i'm not aware of

danielStrobl commented 1 year ago

Hmm, I can't reproduce that. If I run the celltype_random_embedding baseline in a jupyter notebook, I get the expected score of close to 1

import openproblems as op
import scib
import scanpy as sc
import numpy as np
data = op.tasks.batch_integration_embed.datasets.immune_batch(True)
...
data_full = op.tasks.batch_integration_embed.datasets.immune_batch(False)
...
base = op.tasks.batch_integration_embed.methods.baseline.celltype_random_embedding(data_full)
op.tasks.batch_integration_embed.metrics.silhouette_batch(base)
/Users/daniel.strobl/SingleCellOpenProblems/openproblems/tasks/_batch_integration/batch_integration_embed/metrics/sil_batch.py:33: DeprecationWarning: Keyword argument 'group_key' has been deprecated in favour of 'label_key'. 'group_key' will be removed in a future version.
  sil = silhouette_batch(adata, batch_key="batch", group_key="labels", embed="X_emb")

mean silhouette per group:                                   silhouette_score
group                                             
CD10+ B cells                             0.979045
CD14+ Monocytes                           0.986062
CD16+ Monocytes                           0.972553
CD20+ B cells                             0.975165
CD4+ T cells                              0.989482
CD8+ T cells                              0.983753
Erythrocytes                              0.989952
Erythroid progenitors                     0.985813
HSPCs                                     0.978500
Megakaryocyte progenitors                 0.948789
Monocyte progenitors                      0.986552
Monocyte-derived dendritic cells          0.962396
NK cells                                  0.966470
NKT cells                                 0.984584
Plasma cells                              0.971796
Plasmacytoid dendritic cells              0.960826

0.976358670769633
LuckyMD commented 1 year ago

That's fine.... but some of the integration method outputs seem to have a score >1... how is that possible? ^^

LuckyMD commented 1 year ago

combat full scaled, scanorama embed full unscaled...

danielStrobl commented 1 year ago

as far as I've understood the plot shows the scaled values, so they're scaled to the baseline which scored 0.7 in the online run and 0.976 in my tests

scottgigante-immunai commented 1 year ago

It's only happening in the lung dataset. I'd also recommend you run it with test=False if you want to reproduce the exact results.

danielStrobl commented 1 year ago

ah good to know thought it's on the immune dataset. this actually was on the full dataset

LuckyMD commented 1 year ago

oh, I see... okay, thx.

danielStrobl commented 1 year ago

Not as close to 1 for the lung dataset but also not 0.7:

mean silhouette per group:                       silhouette_score
group                                 
B cell                        0.864942
Basal 1                       0.959319
Basal 2                       0.918840
Ciliated                      0.952741
Dendritic cell                0.934674
Endothelium                   0.949904
Fibroblast                    0.928265
Lymphatic                     0.903939
Macrophage                    0.873468
Mast cell                     0.913045
Neutrophil_CD14_high          0.864333
Neutrophils_IL1R2             0.866628
Secretory                     0.913552
T/NK cell                     0.884365
Type 1                        0.981338
Type 2                        0.966990

0.91727137005069
LuckyMD commented 1 year ago

Why is it not 1 though... It should be 1. Can we use a version of this without jitter to check if it is 1?

danielStrobl commented 1 year ago

Tried with a smaller jitter, it's still not getting closer to 1. Currently checking distributions of scores and if I can find something there

danielStrobl commented 1 year ago

image Here's the distribution of scores per cell, some cell types seem to be more affected then others

danielStrobl commented 1 year ago

ok, so with no jitter I get the perfect score of 1

danielStrobl commented 1 year ago

@scottgigante-immunai can we remove the jitter for this baseline completely? or does this have negative effects on anything else?

scottgigante-immunai commented 1 year ago

Not sure if it will affect anything else but no harm in having both. Let's add an extra baseline without the jitter