thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
579 stars 64 forks source link

Different genomes scales #119

Closed tgodoy closed 1 year ago

tgodoy commented 2 years ago

Hi, not reporting a bug, just a question. I have two genomes, but one is a large genome (Ambystoma mexicanum). I wonder if we can put different scales in the genomes when we plot synteny?

gggenomes(genes=gAll, seqs=sAll, links=tblastx_tibble) + geom_seq() + geom_seq_label() + geom_gene() + geom_link()

Captura de Pantalla 2022-03-29 a la(s) 11 30 43
thackl commented 2 years ago

You mean zoom in on the second genome so it becomes larger on the plot? Hm, I can't think of anything that would be very easy.

Of the top of my head I can think of two maybe options:

  1. What would be possible is to break up the long contig, i.e. only show the regions with some space around that actually align. focus() on links can do that automatically (let me know if you want try something like that but don't know how).
  2. Manually scale the coordinates of one of the genomes. Something like multiplying every length, start and end column in for Xtrop_Chr1 in gAll, sAll and tblastx_tibble by 10 or so.
tgodoy commented 2 years ago

Thanks, I think that I will probe the the second solution. But In case I need to do a zoom in some regions, How I can do it?

thackl commented 2 years ago

Here's a minimal example that you might be able to adapt for your case.

library(gggenomes)

g0 <- tribble(
  ~seq_id, ~start, ~end,
  "A", 100, 200,
  "A", 10000, 10100,
  "B", 100, 200,
  "B", 300, 500
)

# clusters are just a faster way for manually creating links
c0 <- tribble(
  ~cluster_id, ~feat_id,
  "cls1", "f1",
  "cls1", "f3",
  "cls2", "f2",
  "cls2", "f4")

p1 <- gggenomes(g0) %>%
  add_clusters(links=c0) +
  geom_seq() +
  geom_gene() +
  geom_link() +
  geom_gene_tag(aes(label=feat_id)) 
p1

image

# zoom in on loci with links, group links less than 200 apart, show 200 flank
# you may have to play with those values
p1 %>% 
  focus(.track_id=links, .max_dist = 200, .expand=200) + 
  geom_seq_label()

image

# and add nice locus labels that indicate start/end of regions
p1 %>% 
  focus(.track_id=links, .max_dist = 200, .expand=200,
    .locus_id=str_glue("{seq_id}:{start}-{end}")) + 
  geom_seq_label()

image

# scaling up B 25 times - not sure how much sense this representation makes, though...
g_scaled <- g0 %>%
  mutate(across(c(start, end), ~ifelse(str_detect(seq_id, "B"), .x * 25, .x)))

gggenomes(g_scaled) %>% add_clusters(c0) + geom_seq() + geom_gene() + geom_link()

image

tgodoy commented 2 years ago

Thanks a lot!!!

waltercostamb commented 7 months ago

This was useful to me! Thank you