Manuscript Figure 4 - Githubissues

alex-d13 commented 2 years ago

This will be about the effect of the bias on deconvolution.

main text a	main text b	supplementary 4a	supplementary 4b	supplementary 4c	supplementary 4d

Hao: extreme bias on 4 cell types; deconvolution with cibersort, epic, quantiseq scaled	Hao: 3 real world bias, 1 no bias; quantiseq scaled & unscaled	Travaglini: extreme bias	Travaglini: real world bias	Maynard: extreme bias	Maynard: real world bias

For a, I would go into the differences how cell types are effected by a scaling factor and how the deconv tools are not equally sensitive to such increased changes.

For b, the main story would be that some cell types (Monocytes in Hao, DCs in maynard, Neutrophils in Travaglini) seem to benefit slightly from a bias with genes/census. The read_number scaling factor is rather decreasing estimation performances (NK in Hao, Macropahges & Neutrophils in Travaglini). This is all now referring to quantiseq scaled. In the unscaled estimations of quantiseq, we can see that deconvoution tools need to internally account for this bias as well, otherwise they get high overestimations, such as with Macrophages in Hao. But I dont know if that should be part of this paper or rather of the benchmarking later on. What do you think?

FFinotello commented 2 years ago

In a, we should use both EPIC and quanTIseq with mRNA correction (is the default). We do not need to indicate this in the labels here, but we can simply write "EPIC" and "quanTIseq".

On the opposite, when we remove the mRNA correction we can write "Some algorithm name (no mRNA scaling)"

FFinotello commented 2 years ago

@alex-d13 could you please remind me the exact parameter settings you used for the 3 deconvolution methods and 3 datasets?

alex-d13 commented 2 years ago

quantiseq & EPIC: tumor = FALSE Cibersort: absolute = TRUE

Simulations: 1000 cells, 100 samples, bias removed in counts (does not matter, because we only use tpms for deconvolution)

Cell types included in datasets: Hao: "B cells" "Dendritic cells" "Monocytes" "NK cells" "T cells CD4" "T cells CD8" "T regulatory cells" Travaglini:"B cells" "Dendritic cells" "Macrophages" "Monocytes" "Neutrophils" "NK cells" "T cells CD4" "T cells CD8" Maynard: "B cells" "Dendritic cells" "Macrophages" "Monocytes" "NK cells" "T cells CD4" "T cells CD8" "T regulatory cells"

In all datasets, the genes with total expression of 0 were removed along with genes with a variance of 0.1.

Also: I just saw that i missed the spike_in based bias for the travaglini simulations. Will update the plot soon :)

FFinotello commented 2 years ago

Thanks Alex! We should use tumor = TRUE for Travaglini and Maynard as they are not blood-derived cells.

alex-d13 commented 2 years ago

Updated set with correct parameters used.

main text a	main text b	supplementary 4a	supplementary 4b	supplementary 4c	supplementary 4d

Hao: extreme bias on 4 cell types; deconvolution with cibersort, epic, quantiseq scaled	Hao: 3 real world bias, 1 no bias; quantiseq scaled & unscaled	Travaglini: extreme bias	Travaglini: real world bias (inlcuding spike_in)	Maynard: extreme bias	Maynard: real world bias

FFinotello commented 2 years ago

Hi @alex-d13 thanks for the nice figures!

Two minor comments:

We could add Pearson correlation
For the real-world scenario, we could have the "no-scaling" version of quanTIseq on the first row.

I would probably put Travaglini in the main text as it has full-transcript data (more similar to real bulk) and you also have the spike-in factors.

What I see in the extreme scenario (Travaglini):

Increase in mRNA abundance bias have impact on all (absolute) methods, seen as systematic deviation from the identity line, which should interpreted as a over- or under-estimation;
Exceptions could be attributed intrinsic limits of deconvolution methods in terms of spillover effects (e.g. EPIC overestimate macro instead of mono, for quanTIseq I see some increased signal for DC and mono). Are we missing some macro markers here?
Macro spilover effects not seen in PBMC data (Hao)
No problems of spillover, collinearity fo B cells -> we see the effect clearly
For CD8, EPIC and CIBERSORT also overestimate NK (similar to CD8)

What I see in the real-world scenario (Travaglini)

No bias, qTs without correction -> easy problem and nice solution: points along the identity line
Add bias, qTS without correction -> overestimation of macro and underestimation of neutrophils.
You could refer to the mRNA bias of these cell types shown in the violin plots: do they have higher and lower mRNA content, respectively?
Add bias, qTs with correction -> systematic bias (especially for macro and neutro) is ameliorated or corrected
Also interesting, as qTS correct for mRNA bias like EPIC and ABIS, if the simulation contain no bias, some cell types seems to be over/under-estimated (see Neutrophils)

omnideconv / SimBu

Manuscript Figure 4 #25