welch-lab / liger

R package for integrating and analyzing multiple single-cell datasets
GNU General Public License v3.0
391 stars 78 forks source link

Parameter optimization guidance for cross species integration #324

Open KaczorowskiLab opened 2 weeks ago

KaczorowskiLab commented 2 weeks ago

Hello,

Thank you for the newly developed UIMNF approach and the well documented tutorials. We are interested in applying this approach to integrate mouse and human snRNAseq brain datasets specifically split across cell types. In the documentation for the runIntegration(), two parameters k and lambda determine the overall integration. While the defaults probably work well with most datasets, I am wondering if there is any guidance to optimize these parameters for specific dataset before running the pipeline. Typical approach would be to test with different combinations of the values for these parameters, but since the datasets are really large, are there any approaches to estimate these parameters based on dataset structure (number of features, samples, known annotations etc) ? Alternatively, would we expect to get comparable results when using a subset of the data to test parameters versus using entire dataset ? Thank you for your help!