Closed trnpl closed 9 months ago
Interesting case. Do you have some (dummy) data for a reproducible example that I can play with?
Here is a file with info on the sequences and genes. Don't think links are really necessary here for this particular case.
How about something like this.
library(gggenomes)
s0 <- read_csv("seqs.csv")
g0 <- read_csv("genes.csv")
g0
# full loci
p0 <- gggenomes(g0, s0) +
geom_seq() +
geom_gene(aes(fill=str_sub(feature_id, 1,3)))
# magic zoom in on "default" which mean all genes
p1 <- p0 |> focus(.expand = 1e3)
p1
# decorate sequence breaks
geom_break <- function(mapping_start = NULL, mapping_end = NULL,
data_start = seqs(start > 1), data_end=seqs(end < length),
label="//", size=4, hjust=.75, family="bold", stat="identity",
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...){
aes_start <- aes(x=x, y=y)
aes_start <- gggenomes:::aes_intersect(mapping_start, aes_start)
aes_end <- aes(x=xend, y=y)
aes_end <- gggenomes:::aes_intersect(mapping_end, aes_end)
list(
geom_text(aes_start, data=data_start, label=label, size=size, hjust=hjust,
family=family, ...),
geom_text(aes_end, data=data_end, label=label, size=size, hjust=1-hjust,
family=family, ...)
)
}
p2 <- p1 + geom_break(label="/")
p2
library(patchwork)
p0 + p1 + p2 + plot_layout(guides="collect") & theme(legend.position = "bottom")
ggsave("break-at-long-non-coding.png", width=10, height=5)
oh, and in case you want to flip things :)
p3 <- p0 |>
focus(.expand=1e3) |>
flip(where(~any(str_sub(.x$feature_id,1,3) == "hea" & .x$strand == "-")), .bin_track = genes)
p3
Thanks! This I think is workable for my purposes :)
Wait -- one more question actually. Is there a way to print the amount of nucleotides excluded in the breaks?
Two options:
1) locate()
does the same as focus, but instead of cutting seqs in the end, it adds a track with the coordinates of the things that focus would cut out. From that table, you can get exact coordinates an compute what was excluded (see code below)
p4 <- p0 |>
locate(.expand = 1e3) +
geom_feat(position = position_nudge(y=-.3))
p4$data$feats$loci
# A tibble: 30 × 15
y x xend bin_id seq_id start end i locus_score locus_id locus_length
<int> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <int> <int> <glue> <dbl>
1 13 0 21286 Dano_sm… Dano_… 1 21286 1 7 Dano_sm… 22101
2 13 37943 53399 Dano_sm… Dano_… 37944 53399 2 3 Dano_sm… 15456
3 14 0 20276 Dpal_sm… Dpal_… 1 20276 1 7 Dpal_sm… 21108
4 14 35778 51267 Dpal_sm… Dpal_… 35779 51267 2 3 Dpal_sm… 15489
5 12 323 17521 Dpan_sm… Dpan_… 324 17521 1 3 Dpan_sm… 17198
6 12 33051 54735 Dpan_sm… Dpan_… 33052 54735 2 6 Dpan_sm… 22462
7 15 0 24170 dana_sm… dana_… 1 24170 1 8 dana_sm… 24985
8 15 38128 53604 dana_sm… dana_… 38129 53604 2 3 dana_sm… 15476
9 11 0 16792 datr_sm… datr_… 1 16792 1 5 datr_sm… 17663
10 11 29803 45264 datr_sm… datr_… 29804 45264 2 3 datr_sm… 15461
2) You can manually set the coordinates to focus()
on via focus(.loci=<tibble(seq_id, start, end)>)
. With that you would know the sizes of gaps ahead of time.
# first 0-23k and 35-55k for first two seqs
loci <- tribble(
~seq_id, ~start, ~end,
"dana_smallsynteny", 1, 23e3,
"dana_smallsynteny", 35e3,55e3,
"Dano_smallsynteny", 1, 23e3,
"Dano_smallsynteny", 35e3,55e3
)
p5 <- p0 |> focus(.loci=loci) + geom_seq_label()
p5
Extremely helpful, thank you so much!
Hi! First, thanks for this excellent package -- it really generates beautiful plots and I appreciate that it works with eukaryotic datasets as well.
I am using this to generate some microsynteny plots for a locus I am working on. One of the odd features of this locus is that there is about ~10-15kb of non-coding region upstream of my GOI, which I would like to 'trim' so that I can make the useful information bigger. I am imagining a sequence break of at least 5kb or so in the middle of the 'contigs' that I show (see boxed parts of the screenshot below). I browsed the issues page and it seems like you were working on adding some tools to do this but I'm not sure if this particular use case has been implemented. I can likely hack something once I export the image as an svg but looking to hardcode as much of this as possible.
Thanks for your help!!