omnideconv / SimBu

Simulate pseudo-bulk RNAseq samples from scRNAseq expression data
http://omnideconv.org/SimBu/
GNU General Public License v3.0
12 stars 1 forks source link

Manuscript Figure 4 #25

Closed alex-d13 closed 2 years ago

alex-d13 commented 2 years ago

This will be about the effect of the bias on deconvolution.

main text a main text b supplementary 4a supplementary 4b supplementary 4c supplementary 4d
image image image image image image
Hao: extreme bias on 4 cell types; deconvolution with cibersort, epic, quantiseq scaled Hao: 3 real world bias, 1 no bias; quantiseq scaled & unscaled Travaglini: extreme bias Travaglini: real world bias Maynard: extreme bias Maynard: real world bias

For a, I would go into the differences how cell types are effected by a scaling factor and how the deconv tools are not equally sensitive to such increased changes.

For b, the main story would be that some cell types (Monocytes in Hao, DCs in maynard, Neutrophils in Travaglini) seem to benefit slightly from a bias with genes/census. The read_number scaling factor is rather decreasing estimation performances (NK in Hao, Macropahges & Neutrophils in Travaglini). This is all now referring to quantiseq scaled. In the unscaled estimations of quantiseq, we can see that deconvoution tools need to internally account for this bias as well, otherwise they get high overestimations, such as with Macrophages in Hao. But I dont know if that should be part of this paper or rather of the benchmarking later on. What do you think?

FFinotello commented 2 years ago

In a, we should use both EPIC and quanTIseq with mRNA correction (is the default). We do not need to indicate this in the labels here, but we can simply write "EPIC" and "quanTIseq".

On the opposite, when we remove the mRNA correction we can write "Some algorithm name (no mRNA scaling)"

FFinotello commented 2 years ago

@alex-d13 could you please remind me the exact parameter settings you used for the 3 deconvolution methods and 3 datasets?

alex-d13 commented 2 years ago

quantiseq & EPIC: tumor = FALSE Cibersort: absolute = TRUE

Simulations: 1000 cells, 100 samples, bias removed in counts (does not matter, because we only use tpms for deconvolution)

Cell types included in datasets: Hao: "B cells" "Dendritic cells" "Monocytes" "NK cells" "T cells CD4" "T cells CD8" "T regulatory cells" Travaglini:"B cells" "Dendritic cells" "Macrophages" "Monocytes" "Neutrophils" "NK cells" "T cells CD4" "T cells CD8" Maynard: "B cells" "Dendritic cells" "Macrophages" "Monocytes" "NK cells" "T cells CD4" "T cells CD8" "T regulatory cells"

In all datasets, the genes with total expression of 0 were removed along with genes with a variance of 0.1.

Also: I just saw that i missed the spike_in based bias for the travaglini simulations. Will update the plot soon :)

FFinotello commented 2 years ago

Thanks Alex! We should use tumor = TRUE for Travaglini and Maynard as they are not blood-derived cells.

alex-d13 commented 2 years ago

Updated set with correct parameters used.

main text a main text b supplementary 4a supplementary 4b supplementary 4c supplementary 4d
image image image image image image
Hao: extreme bias on 4 cell types; deconvolution with cibersort, epic, quantiseq scaled Hao: 3 real world bias, 1 no bias; quantiseq scaled & unscaled Travaglini: extreme bias Travaglini: real world bias (inlcuding spike_in) Maynard: extreme bias Maynard: real world bias
FFinotello commented 2 years ago

Hi @alex-d13 thanks for the nice figures!

Two minor comments:

I would probably put Travaglini in the main text as it has full-transcript data (more similar to real bulk) and you also have the spike-in factors.

What I see in the extreme scenario (Travaglini):

What I see in the real-world scenario (Travaglini)