theislab / scib

Benchmarking analysis of data integration tools
MIT License
294 stars 63 forks source link

Efficient subset of methods/metrics #244

Closed rogershijin closed 3 years ago

rogershijin commented 3 years ago

Hi Luecken et al., thank you so much for creating this pipeline! I am wondering if there's a "lightweight" configuration of sc-ib you would recommend that's fast to run? Thanks a lot!

LuckyMD commented 3 years ago

Hi @rogershijin,

Thanks for posting on here. We have been thinking about a lightweight metrics function and how to reduce the number of methods to run in a smarter way to test that everything is working well. In general, I would remove the following methods and metrics from a "fast run":

Metrics to remove:

Methods to remove to speed things up:

The above methods typically don't perform well, although not all are slow (e.g., SAUCIE runs very quickly but performs poorly).

The issue that you may face is that we rely on poorly performing integration outputs to accurately scale the metrics before aggregating. So it might be a good idea to add one of the methods that otherwise perform poorly to ensure that min-max scaling still makes sense (maybe SAUCIE and/or LIGER).

@mumichae it might be a good idea to create a wrapper for a fast metrics function. Do you think we could also add a second Snakemake rule that runs this instead of the full thing? It might also require an all_fast rule I guess... just two possible endpoints for the pipeline. Any thoughts?

rogershijin commented 3 years ago

Thank you so much!

mumichae commented 3 years ago

We now have three different wrapper functions for metrics