Closed alex-d13 closed 2 years ago
Hi Alex, it looks very nice.
In the third figure, you could also compare the TPM-CPM. And for all plots, you could report the correlation.
And for all plots, you could report the correlation.
Will do that. As for the correlation for the 10x vs ss2 plots: should i calculate it only using genes which are present in both assays? If I use all of them, we get correlation values of 0.1-0.3, because of genes being only present in the single cell dataset of one assay (see updated plots above).
Very nice, Alex!
Sorry that maybe my comment was unclear. I would not report the correlation for the first two plots, where the idea is to show the NB distribution.
For the last two plots, I guess you considered log(counts) (or TPM). I would instead use log(counts+1). And you considered the same cell types for 10x and SS2 data, correct?
I think we can clearly see that lowly-expressed genes are underrepresented in 10x data. Is the R computed for counts correct? Is looks a bit low for having so many points around the diagonal (although the noise is high).
Instead of log(counts + 1), is the rlog or the vst transformation making a big difference in this case? These would potentially better stabilize the variance across all the value range
Instead of log(counts + 1), is the rlog or the vst transformation making a big difference in this case? These would potentially better stabilize the variance across all the value range
I would minimize the impact of normalization here, as the point is to show the raw counts.
And you considered the same cell types for 10x and SS2 data, correct?
Yes, in b and c these are B, TCD4 and TCD8.
Here is the new set of plots, using spearman correlation in b&c and log(counts/tpm +1) axis:
main text | supplementary 2a | supplementary 2b | supplementary 2c |
---|---|---|---|
gene wise mean vs variance, bias was removed from counts using reads | gene wise mean vs variance, bias was removed from counts using reads and then added again using genes | counts of simulations based on SS2 vs 10X based on three different bulk samples: with and without added bias | same as 2b, but with TPM |
Now they look as expected. Great job, Alex!
Please make sure you call the different data sources in a consistent way (10x, 10X, CITE-seq, etc.)
We could report the last figure (now suppl. 2c) in the main text (like Fig. 2b). How does that sound to you @alex-d13 and @mlist ?
Alex, check carefully that the figures, also in the supplementary, are numbered according to the order in which you refer to them in the main text.
We could report the last figure (now suppl. 2c) in the main text (like Fig. 2b). How does that sound to you @alex-d13 and @mlist ?
I think the only thing speaking against this would be the page limit for the manuscript:
The manuscript with embedded figures and tables must not exceed 7 pages of length or 5000 words and must contain an abstract whose length does not exceed 250 words.
I think with figure1 and the large scatterplots of figure4 we already use quite a lot space with figure, so I rather would not put more in the main text until the writing is done and we can see how many pages we have. Right now without the abstract and discussion, I am at 6 pages already
I agree with Alex, we are dangerously close to the limit.
Figure 2 will focus on the count diagnostics that we performed.
Just for your info, I renamed the old
Wuaiping
labels toChen
as this is the correct name of the first author,Wuaiping
is the lab name I believe.