Efficient subset of methods/metrics

rogershijin commented 3 years ago

Hi Luecken et al., thank you so much for creating this pipeline! I am wondering if there's a "lightweight" configuration of sc-ib you would recommend that's fast to run? Thanks a lot!

LuckyMD commented 3 years ago

Hi @rogershijin,

Thanks for posting on here. We have been thinking about a lightweight metrics function and how to reduce the number of methods to run in a smarter way to test that everything is working well. In general, I would remove the following methods and metrics from a "fast run":

Metrics to remove:

kBet
cLiSI
iLISI

Methods to remove to speed things up:

trVAE
Conos
DESC
SAUCIE
LIGER
maybe MNN

The above methods typically don't perform well, although not all are slow (e.g., SAUCIE runs very quickly but performs poorly).

The issue that you may face is that we rely on poorly performing integration outputs to accurately scale the metrics before aggregating. So it might be a good idea to add one of the methods that otherwise perform poorly to ensure that min-max scaling still makes sense (maybe SAUCIE and/or LIGER).

@mumichae it might be a good idea to create a wrapper for a fast metrics function. Do you think we could also add a second Snakemake rule that runs this instead of the full thing? It might also require an all_fast rule I guess... just two possible endpoints for the pipeline. Any thoughts?

rogershijin commented 3 years ago

Thank you so much!

mumichae commented 3 years ago

We now have three different wrapper functions for metrics

metrics_fast: all metrics that don't require a lot of preprocessing
metrics_slim: all metrics apart from kBET and LISI
metrics_all: all metrics

theislab / scib

Efficient subset of methods/metrics #244