theislab / kBET

An R package to test for batch effects in high-dimensional single-cell RNA sequencing data.
Apache License 2.0
154 stars 23 forks source link

High rejection rates in integrated data of lung and liver than two lung tissues #50

Closed Quan-L closed 3 years ago

Quan-L commented 5 years ago

Hello there ,I used the package to detect the batch effect of the data,for example image

First, I used the Seurat to integrate the raw matrix, then fitered cell and caculated the top 2000 VariableFeatures. I used the matrix of 2000 genes as the data to search the batch effect with the kBET, here are my results:

lung1-liver1:

lung1-liver_raw_factor big

lung1-lung3

lung1-3raw big_factor

Reasonable, the batch effect within different tissues is higher than within the same tissues. So,I want to know that why the rejection rates between two different tissues (lung1 vs liver1) lower than the same tissues (lung1 vs lung3). I am looking forward to your reply. Thank you.

mbuttner commented 4 years ago

Hi Quan-L,

that's an interesting question and I agree on your reasoning. I have a couple of suggestions and questions:

  1. Did you check the batch effects before correction?
  2. Lung-Liver: Can you rule out overcorrection? I am wondering if lung and liver have enough cell types in common to ensure a reasonable data integration. If not, Seurat's data integration method would integrate completely distinct cell types.
  3. Lung1-Lung3: Do you observe several cell types and, if yes, shifts in the cell type composition? In that case I advise to run kBET per cluster (or cell type) and average over all kBET results to account for frequency shifts - I wrote an example in the README, section 'Subsampling'.