theislab / scarches

Reference mapping for single-cell genomics
https://docs.scarches.org/en/latest/
BSD 3-Clause "New" or "Revised" License
323 stars 50 forks source link

Choice of reference and query data sets #225

Open hl-xue opened 5 months ago

hl-xue commented 5 months ago

Hello,

Thanks for the nice software.

I have 2 data sets A and B annotated separately, and I am using scArches for a label transfer analysis from one data set to the other to identify their potential similarity and difference. I am following the tutorial here except for skipping the step 4a, in order to compare predicted cell types from the other data set with the actual annotations.

However, I found that different choices of reference/query data sets make remarkable difference. For example:

In this situation, I would like verify with you about two questions:

  1. Is this difference in label transfer results caused by reversing order of reference and query data sets is expected? If so, why could this happen?
  2. If the difference is expected, which result should I trust?

Thanks in advance!

hl-xue commented 5 months ago

Hello,

I have a further question please: when I try to label transfer with the same workflow from one data set to itself (use data set A as ref and the same data set A as target to predict). I got non-perfect matching (with normalised MI score as 0.89). Is this also expected?

Thanks again!