thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
581 stars 64 forks source link

Seq order (`as.factor`) get lost when `focus()`ing #83

Closed iferres closed 2 years ago

iferres commented 2 years ago

Hi, I noticed that the order of sequences (i.e. contigs) get lost when applying focus. For instance:

library(gggenomes)
library(magrittr)

s0 <- tibble(
  bin_id = c("A", "A", "B", "B", "B", "B"),
  seq_id = factor(c("A1", "A1", "B1", "B1", "B2", "B2"), levels = c("A1", "B2", "B1")), # set factor to order contigs
  feat_id = c("a1","a2","b3", "b4", "b1", "b2"),
  start = c(1, 20, 1, 50, 1, 20),
  end = c(10, 30, 40, 70, 10, 30),
  strand = c(1, 1, 1, 1, 1, 1),
  length = c(1000, 1000, 1000, 1000, 1000, 1000)
)

# works ok
p <- gggenomes(s0) + 
  geom_seq() + 
  geom_gene() + 
  geom_gene_label(aes(label=feat_id)) + 
  geom_seq_label()

imagen


# order of contigs (factors) get lost
p %>% focus()

imagen

Any way of keeping that order? Thanks!

thackl commented 2 years ago

Ha, I'm somewhat surprised the first one works ;). Internally, all IDs are converted to characters because otherwise, I get issues when joining across tables (seqs, genes, ...).

That said, I agree, it would be more intuitive if the order of loci would follow the order of the input sequences. I will modify the code to get that behavior as soon as I have time.

In the meantime, you could reorder your sequences after focus

p %>% focus() %>% pick_seqs_within(B2_lc1, B1_lc1)

should do the trick

iferres commented 2 years ago

Thanks! Another question, what does "_lc1", "_lc2", etc, means or how are they set? Just to automate the pick_seqs_within process. In other words: I'm not sure how the 1 or 2 suffixes are chosen.

EDIT: In this example there aren't _lc2, but there're in my real example.

thackl commented 2 years ago

Ah, I see, makes sense. Yes, might not be trivial to automate pick_seqs_within...

"B2_lc1" is short for "locus #1 on sequence B2". Each locus needs a unique ID, so I just count them per sequence. You can change that, the default pattern is focus(..., .locus_id = str_glue("{seq_id}_lc{row_number()}". You can also play with focus(..., .locus_id_group=1) to count loci not per sequence.

But, anyway. I have a few minutes. Let me see if II can push a fix for the order right away!

p %>% focus(.locus_id = str_glue("{seq_id}#{row_number()}"))
p %>% focus(.locus_id_group=1, .locus_id = str_glue("locus{row_number()}"))

image image

thackl commented 2 years ago

with the latest version

p %>% focus()

image

iferres commented 2 years ago

Genius! Worked like a charm. Thanks again Bests!