thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
581 stars 64 forks source link

Successive ribbon and track colors #73

Closed ptranvan closed 11 months ago

ptranvan commented 3 years ago

Hi, Thanks for your amazing package. I am trying to describe an inversion scénario between 2 genomes. Here is my code:

# a minimal seq track
s0 <- tibble(
  seq_id = c("M", "S1", "P"),
  length = c(14011714, 16938810, 16938810)
)

# a minimal gene track
g0 <- tibble(
  seq_id = c("M", "M", "M", "M", "M", "M",
             "S1", "S1", "S1", "S1", "S1", "S1",
             "P", "P", "P", "P", "P", "P"),
  start = c(1, 1957564, 7345956, 7944221, 10852458, 12757211,
            1, 1957564, 10253430, 10851695, 10852458, 12757211,
            1, 7612881, 12493219, 15053107, 15072904, 15671252),
  end = c(1830774, 6805033, 7898617, 10851694, 12679667, 14011714,
          1830774, 6805033, 7345956, 10299033, 12679667, 14011714,
          1832933, 1879467, 7830998, 12534142, 15670067, 16938810)
)

# a simple link track
l0 <- tibble(
  seq_id = c("M", "M", "M", "M", "M"),
  start = c(1, 1957564, 7345956, 10852458, 12757211),
  end = c(1830774, 6805033, 10851694, 12679667, 14011714),
  seq_id2 = c("S1", "S1", "S1", "S1", "S1"),
  start2 = c(1, 1957564, 10851695, 10852458, 12757211),
  end2 = c(1830774, 6805033, 7345956, 12679667, 14011714),  
)

p <- gggenomes(genes=g0, seqs=s0, links=l0)
p + 
  geom_seq() +         # draw contig/chromosome lines
  geom_seq_label() +   # label each sequence 
  geom_gene() +        # draw genes as arrow
  geom_link()          # draw some connections between syntenic regions

Sorry for trivial questions but I am trying to understand how to :

1) Add a distinct color for each track and the corresponding ribbon.

2) The "M" is overlapping with a ribbon, how can I put it on the side instead of under the 1st track ?

3) How can I put ribbons between S1 and P ?

4) Put a legend for each track (and the corresponding colors)

Thanks for your help !

ptranvan commented 3 years ago

inversion_example

thackl commented 3 years ago

The "M" is overlapping with a ribbon, how can I put it on the side instead of under the 1st track?

Either use geom_bin_label() instead of geom_bin_seq()

p + 
  geom_seq() +         # draw contig/chromosome lines
  geom_bin_label() +   # label each sequence 
  geom_gene() +        # draw genes as arrow
  geom_link()          # draw some connections between syntenic regions

image

or geom_bin_seq() + geom_link(offset=c(0.3,0.15)) to make space

p + 
  geom_seq() +         # draw contig/chromosome lines
  geom_seq_label() +   # label each sequence 
  geom_gene() +        # draw genes as arrow
  geom_link(offset=c(0.3,0.15))          # draw some connections between syntenic regions

image

How can I put ribbons between S1 and P?

just add start/end with S1/P as seq_id and seq_id2 (I did not adjust the numbers)

l1 <- bind_rows(l0, tibble(
  seq_id = c("P", "P", "P", "P", "P"),
  start = c(1, 1957564, 7345956, 10852458, 12757211),
  end = c(1830774, 6805033, 10851694, 12679667, 14011714),
  seq_id2 = c("S1", "S1", "S1", "S1", "S1"),
  start2 = c(1, 1957564, 10851695, 10852458, 12757211),
  end2 = c(1830774, 6805033, 7345956, 12679667, 14011714),  
))

gggenomes(genes=g0, seqs=s0, links=l1) + 
  geom_seq() +         # draw contig/chromosome lines
  geom_bin_label() +   # label each sequence 
  geom_gene() +        # draw genes as arrow
  geom_link()          # draw some connections between syntenic regions

image

or if you just want to connect genes, you could do

g1 <- g0 %>% group_by(seq_id) %>%
  mutate(feat_id = paste(seq_id, row_number(), sep = "_")) %>%
  ungroup
c0 <- g1 %>% transmute(cluster_id = str_replace(feat_id, ".*_", "cls"), feat_id)

p2 <- gggenomes(genes=g1, seqs=s0) %>% add_clusters(c0) 
p2 + 
  geom_seq() +         # draw contig/chromosome lines
  geom_bin_label() +   # label each sequence 
  geom_gene() +        # draw genes as arrow
  geom_link()          # draw some connections between syntenic regions 

image

Add a distinct color for each track and the corresponding ribbon. Put a legend for each track (and the corresponding colors)

not 100% sure what you mean. In gggenomes "track" refers to a type of data, all "genes" are a track, all "links" are track. This could be done as below.

p2 + 
  geom_seq() +         # draw contig/chromosome lines
  geom_bin_label() +   # label each sequence 
  geom_gene(aes(fill="gene")) + # draw genes as arrow
  geom_link(aes(fill="link")) # draw some connections between syntenic regions

image

But have the feeling you want something else. Maybe color by genome? If not, could you elaborate?

p2 + 
  geom_seq() +         # draw contig/chromosome lines
  geom_bin_label() +   # label each sequence 
  geom_gene(aes(fill=seq_id)) + # draw genes as arrow
  geom_link() # draw some connections between syntenic regions

image

ptranvan commented 3 years ago

Thanks @thackl it's very helpful.

1) By color, I mean one color for each "gene" and each ribbon so in my example it would be 6 different colors with a legend associated. I have already pre-defined colors so if it's possible to set html color codes that would be great.

2) I was wondering also if the genome size at the bottom could be more precise ? so instead of having0M 5M 10M 15M, I would have a scale every 2M ?

thackl commented 3 years ago
# the easiest is probably to put genes in clusters
c1 <- tribble(
  ~cluster_id, ~feat_id,
  "cls1", "M_1",
  "cls1", "S1_1",
  "cls1", "P_1", 
  "cls2", "M_2",
  "cls2", "S1_4",
  "cls2", "P_6"
  # and so on
)

p3 <- gggenomes(genes=g1, seqs=s0) %>%
  add_clusters(c1) +
  geom_seq() +         # draw contig/chromosome lines
  geom_bin_label() +   # label each sequence 
  geom_gene(aes(fill=cluster_id)) + # draw genes as arrow
  geom_link() # draw some connections between syntenic regions
p3

image

# cluster_id is also appended to links
# use scale_*_manual for custom coloring
p3 + 
  geom_seq() +         # draw contig/chromosome lines
  geom_bin_label() +   # label each sequence 
  geom_gene(aes(fill=cluster_id)) + # draw genes as arrow
  geom_link(aes(fill=cluster_id, color=cluster_id)) + # draw some connections between syntenic regions
  scale_fill_manual(values=c(cls1="#aaaaff", cls2="#ffaaaa"), na.value="grey70") +
  scale_color_manual(values=c(cls1="#aaaaff", cls2="#ffaaaa"), na.value="grey70")

image

# the axis labels can be controlled via scale_x_bp
# Have a look at scale_x_continuous() for more options.
p3 + 
  geom_seq() +         # draw contig/chromosome lines
  geom_bin_label() +   # label each sequence 
  geom_gene(aes(fill=cluster_id)) + # draw genes as arrow
  geom_link(aes(fill=cluster_id, color=cluster_id)) + # draw some connections between syntenic regions
  scale_fill_manual(values=c(cls1="#aaaaff", cls2="#ffaaaa"), na.value="grey70") +
  scale_color_manual(values=c(cls1="#aaaaff", cls2="#ffaaaa"), na.value="grey70") +
  scale_x_bp(breaks=seq(0, 18, 2)*1e6)

image