Closed fanc-WU closed 3 years ago
You may try using RPCA rather than CCA as it is a more conservative integration.
Even then, the integration methods assume that there is sufficient similarity between your samples. In your case, a majority of the axes of variation in one of your samples (all cells) will not be present in your other sample (stem cells only). This can result in somewhat poorer integration though I have yet to see a case where the difference is as drastic as this.
Feel free to send your dataset to seuratpackage@gmail.com for further guidance and assistance.
I've been having trouble integrating such datasets:
Scenario 1: the ifnb dataset from Seurat. first, integration using full datasets.
The resulting plot:
Next, we subset the second seurat object in ifnb.list (stimulated one), while keeping the entire first object
The resulting plot:
As you can see, Seurat did "mis-assign" some of the cells, but the majority of the cells are correctly matched.
Scenario 2 This is my own dataset. In this developmental system, the stem cells will differentiate into 3 different trajectories, as annotated in the figure below (for sample 1). The stem cell populations in these 2 samples:
I integrated my 2 samples:
The stem cell populations matched very well through integration:
Now, I subset the sample.2 to only contain stem cells, and repeat integration:
As you can see, the stem cell population is now mostly scattered across all populations. This is also shown in the plot below. sample.2 doesn't even concentrate at cluster 3 (where stem cells are supposed to be at)
I would highly appreciate it if you could offer help with your expertise. What are the parameters I need to tune to make the integration better? We have more datasets where only the stem cells were sorted and sequenced. I hope to integrate these samples into this full developmental landscape.
Best regards.