omnideconv / SimBu

Simulate pseudo-bulk RNAseq samples from scRNAseq expression data
http://omnideconv.org/SimBu/
GNU General Public License v3.0
12 stars 1 forks source link

Manuscript Figure 3 #24

Closed alex-d13 closed 2 years ago

alex-d13 commented 2 years ago

Figure 3 is the comparison of scaling factors.

This heatmap took forever, but now we finally have everything in one place :) Would you rather use the version with circles for number of cell types or the version with just printed values?

circles text supplementary 3a supplementary 3b
image image image image
comparison of median spike in counts with other data-driven approaches boxplot for celltypes

As always, I am open for color/layout suggestions :)

FFinotello commented 2 years ago

I like the new heatmap a lot. A few suggestions:

federicomarini commented 2 years ago

+1 on square for me as well, it "gives the same weight to both axes"

The cell type legend on SF3a is taking a lot of space, can that be compacted with some gg-fu?

3b: instead of boxplots, what about violin or ridge plots instead? They would capture much better the distribution (modality, skewness), and would likely be a simple drop-in replacement in terms of the geom used?

alex-d13 commented 2 years ago

Thanks for the suggestions :) Here would be an updated set of figures: (all correlation values are now spearman correlation)

heatmap supplementary 3a supplementary 3b
image image image
- - image
squared heatmap with boxes instead of circles, makes it easier to spot differences imo we decided to only look at the travaglini dataset in this figure, this also allows a comparison on single cell level, not only on cell type. The cluster below the main distribution of cells (spike_in vs. genes/census) contains multiple different cell types, mainly T cell subtypes and Monocytes. Should I go into detail here? i tried again with violin plots. the issue is with the travaglini dataset, as it has much less cells per cell type, so the violins become really small. I can overcome this by scaling the violins to the same width, but then we loose the information on how many cells per cell type are present. The lower plot is without scaling. (Ridge plots did not really work well with facet_grid and free y axis.. )
FFinotello commented 2 years ago

Very nice! I like the squares-heatmap! Just try to be consistent with the first letters (all lowercase?)

3a, the message is very clear. Minor upgrades: bigger text, smoothscatter or hexbin plots?

I also like the scaled violin plots. They are really clear

federicomarini commented 2 years ago

Yes, violins are very nice - and do convey the distro shapes pretty nicely! Having them small is IMHO a fair price to pay?

FFinotello commented 2 years ago

Oh, one small thing: did we lose information on 10x/CITE-seq vs. SS2?

grst commented 2 years ago

The heatmap is nice! In terms of readability of the color annotations:

I think it would improve the readability if you rotated the heatmap clockwise by 90 degree, s.t. the colorbars are next to the legend. Ideally you could then condense the legend a bit (by using multiple columns), then the legends would be very close to the respective color bars.

mlist commented 2 years ago

I'm not so sure if I like the double-encoding of information here. The column source and scaling_factor are redundant with the row labels. What do others think? Also I would remove the underscore in scaling factor.

mlist commented 2 years ago
alex-d13 commented 2 years ago

Hi, I wanted to show the updated plots yesterday, but somehow github had issues with commenting and uploading pictures. So now here are the updated set of plots:

heatmap supplementary 3a supplementary 3b
image image image
mlist commented 2 years ago

Even better now, turning the heatmap was a great idea. I think we should still discuss removing the row labels. If necessary we can probably make the legend a bit more compact by having the top 2 elemens (boxes and color gradient) break across two lines.

alex-d13 commented 2 years ago

I could maybe shorten the row and column labels to only the name, without the used scaling factor in brackets? That way they get shorter and we still have the information of the exact name of each dataset (currently nowhere else indicated). I also agree on the legend changes, lets see if I can do this with ggplot directly or photoshop.

grst commented 2 years ago

I would keep at least the dataset labels. But I'm not opposed to keep the scaling labels as well. Redundant coding is a good thing.

Putting the color bars to the right did work well, it's now a lot easier to match the colors between legend and annotation. Now these are complaints on a high level, but to make it even better you could