Open tahsinkhann opened 1 month ago
The easiest way is probably to use https://yulab-smu.top/pkgdocs/aplot.html. Check out section 4.2.2 Aligning plots with a tree. Does that help?
Thanks for the response. The link was useful, learned new concepts. However, is it possible to generate something like the draft figure I attached.
in the figure a) tree b) gene_presence_absence.csv from roary/panaroo c) all CDS in a pangenome generated by gggenomes (???)
basically I am thinking of an alternative of the attached circular pangenome figure (I don't know how the authors made this figure, nothing was mentioned about the methods in theri paper)
Thanks
OK, here's some code that I think goes into the direction that you are looking for, and obviously with room to make it prettier. That said, I think there some conceptual limitations here. All you are showing is relative to the picked reference genome, i.e. it does not show any genes that are not present in the reference and it does not account for rearrangements...
library(tidyverse)
library(ggtree)
library(gggenomes)
library(aplot)
# get a random tree
library(ape)
set.seed(2017)
tree <- rtree(4, tip.label = LETTERS[1:4])
p_tree <- ggtree(tree) + geom_tiplab()
p_tree
# get some random gene coords
genes <- filter(emale_genes, seq_id == "Cflag_017B") |>
transmute(seq_id = "A", start, end, feat_id, note=Note)
# reference genes - NOTE: any gene that is not present in the reference will not be shown in the plot!
p_genes <- gggenomes(genes, infer_start = 0) +
geom_gene(aes(fill=note)) + geom_bin_label()
p_genes
# get random presence/absence for all reference genes and all genomes
presence <- map(LETTERS[1:4], function(id){
tibble(
seq_id = id,
feat_id = genes$feat_id,
presence = sample(0:1, length(genes$feat_id), replace = T, prob=c(0.2, 0.8)))
}) |> bind_rows()
ggplot(presence) + geom_raster(aes(feat_id, seq_id, fill=as.logical(presence)))
# we need to merge p/a with gene start/end and establish the same sorting order as in tree
tree_label_order <- p_tree$data |> filter(isTip) |> arrange(, y) |> pull(label)
genes_presence <- presence |>
left_join(select(genes, -seq_id)) |> # ignore reference seq_id ("A") to create records for "A-D"
mutate(
width = width(start, end),
seq_id = factor(seq_id, levels=tree_label_order)) # reorder based on tree
# use geom_tile over geom_raster as it allows boxes of different sizes
p_presence <- ggplot(genes_presence) +
geom_tile(aes(width = width, height = .9, x = (start+end)/2, y = seq_id, fill=as.logical(presence)))
p_presence
# combine plots
p_presence |> insert_left(p_tree, width=.2) |> insert_top(p_genes, height=.2)
Hi, I want to a create pangenome plot for bacteriophages with phylogenetic tree (left side) + gene presence/absence (right side) + gggenome plot that has all ORFs/CDS (in top of presence/absence plot)
I have a tree in nwk format, gene_presence_absence.csv from roary/panaroo. All I need is to combine these with a gggenome like pangenome representation in top of gene_presence_absence matrix.
Lastly, thanks for this amaizing gggenomes tool, it benifited me highly.