thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
581 stars 64 forks source link

Enhancements: seq breaks symbols and scale #85

Closed iferres closed 11 months ago

iferres commented 2 years ago

Hi again!

Not reporting a bug, just to suggest a couple of enhancements for future releases.

1) To be able to add seq breaks symbols as in this comment (two parallel lines ~45 degrees at the beginning/end of each break).

2) To be able to draw a scale but not the whole x axis. This is specially useful when using focus() since it doesn't make sense to draw an axis for truncated contigs. Instead, using a small scale to compare relative sizes as usually done in phylogeny figures would be nice. For example see ggtree::geom_treescale. It is probably possible using ggplot2 magic, but it would be nice to have an example in the documentation.

Sorry for the spam :P Bests!

thackl commented 2 years ago

Definitely good ideas! Challenge accepted ;)

I'm still playing around with some ideas. Would be curious about your thoughts on the following:

gggenomes(emale_genes, emale_seqs) %>% focus(name=="MCP") +
  geom_seq() + geom_gene() + geom_gene_tag(aes(label=name)) +
  geom_break() + # add // add ends of truncated seqs (alternatively: geom_seq(breaks=TRUE))
  geom_scale_bar() + no_x_axis() # add a scale bar

image

iferres commented 2 years ago

Looks good!

I'm not sure about the geom_break(), since it only make sense when focus()ing, don't you think? How about focus(add_breaks=TRUE), or something like that? I'm not an expert on ggplot2's grammar, but I can't see the what would be its behaviour if don't wrapped into a focus call.

Regarding the scale bar, it also looks very good! Here I link the ggtree approach, which makes use of a custom theme if you want to remove the x axis. May be it serves you as inspiration. Using similar approaches probably helps users to find what they saw in other packages. I don't remember gggenes's approach on this, but probably theme_genes() is doing the trick.

Thank you for your interest!

thackl commented 2 years ago

You don't necessarily need focus() to truncate sequences. A truncated sequence is defined by having - in addition to a length - a start >1 and/or an end < length. You can also manually set that to illustrate some more complex situations, see the example below.

s0 <- tribble(
   # start/end define regions, i.e. truncated contigs
  ~bin_id, ~seq_id, ~length, ~start, ~end,
  "complete_genome", "chromosome_1_long_trunc_2side", 1e5, 1e4, 2.1e4,
  "fragmented_assembly", "contig_1_trunc_1side", 1.3e4, .9e4, 1.3e4,
  "fragmented_assembly", "contig_2_short_complete", 0.3e4, 1, 0.3e4,
  "fragmented_assembly", "contig_3_trunc_2sides", 2e4, 1e4, 1.4e4
)

l0 <- tribble(
  ~seq_id, ~start, ~end, ~seq_id2, ~start2, ~end2,
  "chromosome_1_long_trunc_2side", 1.1e4, 1.4e4, 
    "contig_1_trunc_1side", 1e4, 1.3e4,
  "chromosome_1_long_trunc_2side", 1.4e4, 1.7e4,
    "contig_2_short_complete", 1, 0.3e4,
  "chromosome_1_long_trunc_2side", 1.7e4, 2e4,
    "contig_3_trunc_2sides", 1e4, 1.3e4
)

gggenomes(seqs=s0, links=l0) +
  geom_seq() + geom_break() + geom_seq_label(nudge_y=-.05) + geom_link()

image

focus() computes start/end for sequences based on some criteria. It also does not plot by itself. It is like mutate() for a tibble. It just adds/modifies start/end columns for sequences in a gggenomes object (and filters unused sequences). That's why focus(add_breaks) would not make sense (it always computes breaks, but has nothing to do with plotting them)

geom_break() adds breaks at the ends of truncated sequences. On a plot without any truncated sequences, it would plot nothing. It could, however, be shortened to geom_seq(add_breaks=TRUE) to automatically add breaks to every truncated sequence that is drawn. The drawback of that approach, it would not be possible to further manipulate the breaks - change the icon, size, color, .... But it would be faster. So maybe it might make sense to have both options - geom_seq(add_breaks=TRUE) for default breaks and geom_break() for customized breaks.

gggenomes also uses a custom theme (theme_gggenomes). no_x_axis() is just a wrapper around functions to manipulate the theme. It would also be possible to create a theme_gggenomes_no_x_axis(). Alternatively, one can also just use theme_void() to remove everything.

no_x_axis <- function (){
  theme(axis.line.x = element_blank(), axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank())
}

Suppressing the axis could be made part of geom_scale_bar(remove_x_axis=TRUE or so, to automatically suppress the axis if the scalebar is used. However, I feel like removing the axis explicitly makes it more transparent.

iferres commented 2 years ago

Ah I see, now makes sense to me to have geom_break(). Thanks again for taking your time to explain it.

Regarding the scale bar, my two cents:

... + 
   theme_gggenomes_scalebar()

?

thackl commented 2 years ago

Thank you for taking the time to give feedback! Really appreciated. The theme option sounds good! I'll try to add this to the next release.

iferres commented 2 years ago

I assume the following feature request is not trivial at all, but have you considered ... + coord_polar() to draw circularized contigs? Playing with the package (and diving into the source code) I found that the following kinda works:

library(gggenomes) 

s0 <- tibble(
  gene_id = letters[1:6],
  bin_id = c("A", "A", "B", "B", "B", "B"),
  seq_id = factor(c("A1", "A1", "B1", "B1", "B2", "B2"), levels = c("A1", "B2", "B1")), # set factor to order contigs
  feat_id = c("a1","a2","b3", "b4", "b1", "b2"),
  start = c(1, 20, 1, 50, 1, 20),
  end = c(10, 30, 40, 70, 10, 30),
  strand = c(1, 1, 1, 1, 1, 1),
  length = c(1000, 1000, 1000, 1000, 1000, 1000)
)

gggenomes(s0) + 
  geom_seq() + 
  gggenomes:::geom_gene2() + 
  coord_polar() # + 
  # facet_wrap(~bin_id)

but I guess is experimental and there's a lot to work with to make it stable and user friendly, isn't it?

thackl commented 2 years ago

I've opened this as a separate issue so I can easier keep track