immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
528 stars 99 forks source link

Appropriate batch variable for specific experimental design #172

Open lucygarner opened 2 years ago

lucygarner commented 2 years ago

Hi,

I would like some advice on the appropriate batch variable for my experimental design.

I have samples from 8 different individuals - for 3 of these, I have cells from blood and liver, and for the other samples, I just have cells from the blood. After correction, I would expect liver cells from different donors to align and the same for blood. Since some of the donors lack liver cells, I would still expect there to be liver-specific clusters that contain 3/8 donors.

My current plan would be to use "donor" as the batch variable with a θ value of 0. Does this seem reasonable or what would you recommend?

Additionally, what is the best way to decide on the number of PCs to use as input to Harmony? Do you just decide based on a scree plot or is there a more statistically rigorous approach?

Best wishes, Lucy

pati-ni commented 1 year ago

Hi @lucygarner and sorry for the delay. Are you having still issues with this analysis?

I would say that harmony is quite flexible and won't complain about unbalanced designs. I would be interested to hear how you ended up doing the analysis. As you suggest, donor is the most sensible thing to do.

Regarding theta, setting it to zero is a good starting point, you can start increasing it if the downstream analysis do not look so well.

Finally, about PCs there is no golden rule about how many to choose and we actively work towards a best practices approach. That said, I can comment that it is best to start with fewer PCs (20 or less).