omnideconv / SimBu

Simulate pseudo-bulk RNAseq samples from scRNAseq expression data
http://omnideconv.org/SimBu/
GNU General Public License v3.0
12 stars 1 forks source link

Simulation results #10

Closed alex-d13 closed 2 years ago

alex-d13 commented 2 years ago

Hi all,

I spent the last days generating some results for the simulation. I used only quantiseq for deconvolution of the simulated datasets and for now mainly focused on the effect of scaling factors. One example can be seen here: image I used the Travaglini-Dataset (the one with spike-in data) and compared the correlations with and without scaling factors. Basically one bar is the correlation of one scatter plot from here (the empty scatter-plots are cell-types which are not in the dataset; also, I compared Macrophages.M1 from quantiseq with the annotated Macrophages from Travaglini): image

I feel like its pretty hard to analyze these results, since there is no obvious trend when using scaling-factors.. I am planing on doing this same analysis now for the 3 other datasets i was already using for the scaling-factor analysis (maynard, hao & vento-tormo).

I was wondering if you feel like I should also use different deconvolution tools like EPIC, or if that would be more part of the full benchmark?

FFinotello commented 2 years ago

Now I am wondering whether we are oversimplifying things. Cell types can have the same slope but different intercept when mRNA bias is not corrected for...

alex-d13 commented 2 years ago

Now I am wondering whether we are oversimplifying things. Cell types can have the same slope but different intercept when mRNA bias is not corrected for...

True..these are the linear models per cell-type, most have the same intercept, but not always.

image

alex-d13 commented 2 years ago

Update on my email from earlier: Using the sum as aggregation method did not really change anything.

FFinotello commented 2 years ago

One thing we should check carefully is which mRNA bias is present in TPM vs. count data. I have the feeling that count data are more biased than TPM and that the bias we had during the count-based pseudo-bulk simulation might be too large. Let's keep this in mind for some later checks ;)