omnideconv / SimBu

Simulate pseudo-bulk RNAseq samples from scRNAseq expression data
http://omnideconv.org/SimBu/
GNU General Public License v3.0
12 stars 1 forks source link

Manuscript Figure 2 #23

Closed alex-d13 closed 2 years ago

alex-d13 commented 2 years ago

Figure 2 will focus on the count diagnostics that we performed.

main text supplementary 2a supplementary 2b supplementary 2c
image image image image
gene wise mean vs variance, bias was removed from counts using reads gene wise mean vs variance, bias was removed from counts using reads and then added again using genes counts of simulations based on SS2 vs 10X based on three different bulk samples: with and without added bias same as 2b, but with TPM

Just for your info, I renamed the old Wuaiping labels to Chen as this is the correct name of the first author, Wuaiping is the lab name I believe.

FFinotello commented 2 years ago

Hi Alex, it looks very nice.

In the third figure, you could also compare the TPM-CPM. And for all plots, you could report the correlation.

alex-d13 commented 2 years ago

And for all plots, you could report the correlation.

Will do that. As for the correlation for the 10x vs ss2 plots: should i calculate it only using genes which are present in both assays? If I use all of them, we get correlation values of 0.1-0.3, because of genes being only present in the single cell dataset of one assay (see updated plots above).

FFinotello commented 2 years ago

Very nice, Alex!

Sorry that maybe my comment was unclear. I would not report the correlation for the first two plots, where the idea is to show the NB distribution.

For the last two plots, I guess you considered log(counts) (or TPM). I would instead use log(counts+1). And you considered the same cell types for 10x and SS2 data, correct?

I think we can clearly see that lowly-expressed genes are underrepresented in 10x data. Is the R computed for counts correct? Is looks a bit low for having so many points around the diagonal (although the noise is high).

federicomarini commented 2 years ago

Instead of log(counts + 1), is the rlog or the vst transformation making a big difference in this case? These would potentially better stabilize the variance across all the value range

FFinotello commented 2 years ago

Instead of log(counts + 1), is the rlog or the vst transformation making a big difference in this case? These would potentially better stabilize the variance across all the value range

I would minimize the impact of normalization here, as the point is to show the raw counts.

alex-d13 commented 2 years ago

And you considered the same cell types for 10x and SS2 data, correct?

Yes, in b and c these are B, TCD4 and TCD8.

Here is the new set of plots, using spearman correlation in b&c and log(counts/tpm +1) axis:

main text supplementary 2a supplementary 2b supplementary 2c
image image image image
gene wise mean vs variance, bias was removed from counts using reads gene wise mean vs variance, bias was removed from counts using reads and then added again using genes counts of simulations based on SS2 vs 10X based on three different bulk samples: with and without added bias same as 2b, but with TPM
FFinotello commented 2 years ago

Now they look as expected. Great job, Alex!

Please make sure you call the different data sources in a consistent way (10x, 10X, CITE-seq, etc.)

FFinotello commented 2 years ago

We could report the last figure (now suppl. 2c) in the main text (like Fig. 2b). How does that sound to you @alex-d13 and @mlist ?

Alex, check carefully that the figures, also in the supplementary, are numbered according to the order in which you refer to them in the main text.

alex-d13 commented 2 years ago

We could report the last figure (now suppl. 2c) in the main text (like Fig. 2b). How does that sound to you @alex-d13 and @mlist ?

I think the only thing speaking against this would be the page limit for the manuscript:

The manuscript with embedded figures and tables must not exceed 7 pages of length or 5000 words and must contain an abstract whose length does not exceed 250 words.

I think with figure1 and the large scatterplots of figure4 we already use quite a lot space with figure, so I rather would not put more in the main text until the writing is done and we can see how many pages we have. Right now without the abstract and discussion, I am at 6 pages already

mlist commented 2 years ago

I agree with Alex, we are dangerously close to the limit.