rongstat / SMAI

Alignability testing and integration of single-cell data
MIT License
21 stars 3 forks source link

imbalanced data sets and SMAI-align order #2

Open tbrunetti opened 5 months ago

tbrunetti commented 5 months ago

Hello! Very interesting paper! I am curious about trying this method out but I wanted to ask a couple questions first:

  1. Based on your supplemental table 1, you show the number of cells coming from each data set and the cell numbers are pretty balanced for scRNAseq (not spatial). How does this work if you have a larger discrepancy between cell numbers in dataset 1 vs dataset 2. I.e 10000 cells in one set by 3000 cells in the other. Does the performance and accuracy suffer or how would you recommend it is handled? image

  2. This is related in a way to my first question, but in the paper you mention a limitation is that it works on only two datasets at a time, however you could sequentially add to it. How then does the order play an effect? Have you tested it on ordering the integration in different ways on 4 or more datasets and does it reproduce roughly the same result regardless of the order and cell count differences?

Thanks!

rongstat commented 3 months ago

So sorry for missing this!! Thank you so much for your question!

  1. For alignable datasets (null model) with unbalanced sample sizes, the performance would rely on the complexity of the underlying structure (e.g., number of cell types). As long as the smaller dataset does not suffer much in its representation of all underlying cell types, the method should work okay (i.e., not rejecting the null).
  2. This is a great question! We are currently developing a version of the algorithm that allows for integrating multiple datasets simultaneously without specifying an order.

Hope this helps. Thanks!!