Open tania-k opened 4 months ago
Dear Tania,
thanks for reaching out.
The problem with your first error (genes + seqs) is that gggenomes only reads one thing from the fasta file with the sequence: the length, in your case 13000bp. But the coordinates of you genes are then 6107000. So when gggenomes tries to plot that, it cannot find any genes that fit on a 13000bp contig. If you want to zoom in on parts of sequences, you either not provide the sequence at all and gggenomes will just zoom in on the range of genes you provide, or you can explicitly set the length together with start end end for the sequence (which I did in the example below)
In the second case, you are reading your TE gff as seq=
(second argument of gggenomes). What should work is z <- gggenomes(gff, feats=repeat_edta) + ...
.
This works for me (for the parts of the gffs that you provided)
# explicitly specify range of chromosome to plot
s0 <- tibble(
seq_id = "JAEVHH010000002.1",
length = 6107000,
start = 6094000,
end = 6107000
)
# read genes
g0 <- read_feats("genes.gff")
# read TE
t0 <- read_feats("te.gff")
gggenomes(g0, s0, t0) +
geom_gene() +
geom_seq() +
geom_feat(aes(color=type), data=feats(genes)) +
geom_bin_label() +
geom_seq_label() +
geom_gene() +
geom_feat()
Hope that helps!
Hi Tania,I be happy to take another look tomorrow. Would you mind sharin your entire gene and TE gff file. Would make it easier to see what the issue is. I will of course treat the data confidential.BestThomasOn 3 Jul 2024 21:06, Tania Kurbessoian @.***> wrote: Hi thackl, Thank you for taking the time to explain that for me. I did get the visuals to work after some time... kinda. I realized my two tracks are on opposite strands. My gene GFF file on the negative strand, while my GFF containing TEs are on the positive strand. I am not sure if that is why my TEs are not appearing on the strand properly? I'm not receiving any errors. I've also adjusted the position option to identity and pile (default) but they just stack in one corner on the right. p <- gggenomes(gff, s0, repeat_edta) + geom_gene(aes(fill = name), data=feats(genes)) + geom_feat(linewidth = 5, position = "jitter", aes(color= type), data=feats(feats)) + geom_seq() + geom_bin_label() + guides(color = guide_legend(title = "Repetitive elements"),fill = guide_legend(title = "GeneID")) + scale_color_manual(values = c("#CC79A7", "#D55E00")) p Screenshot.2024-07-03.at.12.03.53.PM.png (view on web) Any suggestions?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Hi Thomas, I figured it out, required a bit of adjusting but here is my code in entirety (if folks fall into the same mis-step like me)
`s0 <- tibble( seq_id = "H_ohiense_G217B_JAEVHH010000002.1", length = 16000, start = 6094000, end = 6110000 )
p <- gggenomes(gff, s0, repeat_edta) + geom_gene(aes(fill = name), data=feats(genes)) + geom_feat(linewidth = 5, position = "identity", aes(color= type), data=feats(feats)) + scale_color_manual(values = c("#D55E00", "#CC79A7", "#56B4E9")) + geom_seq() + geom_bin_label() + guides(color = guide_legend(title = "Repetitive elements"),fill = guide_legend(title = "GeneID")) p `
I realized I needed to broaden my visual to capture the TEs as they were past the gene coordinates in my s0 variables. Attached is my result.
Thanks again for all the help!! Tania
Hello gggenomes folks.
I am currently attempting to run your program on my dataset that are subsections of:
Looks like:
FASTA (head)
GFF3 (head)
TE GFF3 (head)
Called as:
When running:
I receive It has all my sequences on separate lines, but adding in any of the other two features, seq or repeat_edta throws errors.
z <- gggenomes(genes=gff, seqs=seq) + geom_gene() + geom_seq() + geom_feat(aes(color=type), data=feats(genes)) + geom_bin_label() + geom_seq_label() z
Only saw
type=NAin genes and will treat everything as
type="CDS".
Error in
require_vars(): ! Required column(s) missing: • length Run
rlang::last_trace()to see where the error occurred.
My first error adding all three files gives me: `> p <- gggenomes(seqs = seq, genes = gff, feats = repeat_edta) + geom_seq() +
I am unsure how to progress.
I am running gggenomes on RStudio run through Linux/HPCC. I've updated ggplots2 to 3.5.0 and other dependencies along with restarting my R session.
Thank you for your time.