Manuscript Figure 3 - Githubissues

alex-d13 commented 2 years ago

Figure 3 is the comparison of scaling factors.

This heatmap took forever, but now we finally have everything in one place :) Would you rather use the version with circles for number of cell types or the version with just printed values?

circles	text	supplementary 3a	supplementary 3b

comparison of median spike in counts with other data-driven approaches	boxplot for celltypes

As always, I am open for color/layout suggestions :)

FFinotello commented 2 years ago

I like the new heatmap a lot. A few suggestions:

I like the squared version better, where each cell is a square.
Definitely dots rather than numbers. Could you maybe make them a bit bigger?
The labels on the bottom clash with each other. Can this be fixed? You could put the legend ones above the plot, or use the same angle for all labels. Other ideas?
Source: I would not repeat again: quanTIseq, EPIC, Monaco. You could call it "Single-cell data" and specify "none" for deconvolution scaling factors, and leave "Travaglini", "Hao", "Maynard".

federicomarini commented 2 years ago

+1 on square for me as well, it "gives the same weight to both axes"

The cell type legend on SF3a is taking a lot of space, can that be compacted with some gg-fu?

3b: instead of boxplots, what about violin or ridge plots instead? They would capture much better the distribution (modality, skewness), and would likely be a simple drop-in replacement in terms of the geom used?

alex-d13 commented 2 years ago

Thanks for the suggestions :) Here would be an updated set of figures: (all correlation values are now spearman correlation)

heatmap	supplementary 3a	supplementary 3b

-	-
squared heatmap with boxes instead of circles, makes it easier to spot differences imo	we decided to only look at the travaglini dataset in this figure, this also allows a comparison on single cell level, not only on cell type. The cluster below the main distribution of cells (spike_in vs. genes/census) contains multiple different cell types, mainly T cell subtypes and Monocytes. Should I go into detail here?	i tried again with violin plots. the issue is with the travaglini dataset, as it has much less cells per cell type, so the violins become really small. I can overcome this by scaling the violins to the same width, but then we loose the information on how many cells per cell type are present. The lower plot is without scaling. (Ridge plots did not really work well with facet_grid and free y axis.. )

FFinotello commented 2 years ago

Very nice! I like the squares-heatmap! Just try to be consistent with the first letters (all lowercase?)

3a, the message is very clear. Minor upgrades: bigger text, smoothscatter or hexbin plots?

I also like the scaled violin plots. They are really clear

federicomarini commented 2 years ago

Yes, violins are very nice - and do convey the distro shapes pretty nicely! Having them small is IMHO a fair price to pay?

FFinotello commented 2 years ago

Oh, one small thing: did we lose information on 10x/CITE-seq vs. SS2?

grst commented 2 years ago

The heatmap is nice! In terms of readability of the color annotations:

I think it would improve the readability if you rotated the heatmap clockwise by 90 degree, s.t. the colorbars are next to the legend. Ideally you could then condense the legend a bit (by using multiple columns), then the legends would be very close to the respective color bars.

mlist commented 2 years ago

I'm not so sure if I like the double-encoding of information here. The column source and scaling_factor are redundant with the row labels. What do others think? Also I would remove the underscore in scaling factor.

mlist commented 2 years ago

the new violin plots look great.

alex-d13 commented 2 years ago

Hi, I wanted to show the updated plots yesterday, but somehow github had issues with commenting and uploading pictures. So now here are the updated set of plots:

heatmap	supplementary 3a	supplementary 3b

mlist commented 2 years ago

Even better now, turning the heatmap was a great idea. I think we should still discuss removing the row labels. If necessary we can probably make the legend a bit more compact by having the top 2 elemens (boxes and color gradient) break across two lines.

alex-d13 commented 2 years ago

I could maybe shorten the row and column labels to only the name, without the used scaling factor in brackets? That way they get shorter and we still have the information of the exact name of each dataset (currently nowhere else indicated). I also agree on the legend changes, lets see if I can do this with ggplot directly or photoshop.

grst commented 2 years ago

I would keep at least the dataset labels. But I'm not opposed to keep the scaling labels as well. Redundant coding is a good thing.

Putting the color bars to the right did work well, it's now a lot easier to match the colors between legend and annotation. Now these are complaints on a high level, but to make it even better you could

reorder the legend labels that they appear in the same order as in the color bar. This is, btw, also explained in the "redundant coding" chapter.
consider giving quantiseq, epic and monaco the same color, maybe labelled "provided by method" or something like that.

omnideconv / SimBu

Manuscript Figure 3 #24