thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
587 stars 64 forks source link

Gff not recognized #177

Closed m-bogaerts closed 7 months ago

m-bogaerts commented 7 months ago

Hello!

I have some issues using the read_gff3 function. I am getting a gff from exonerate and I get the following error: Harmonizing attribute names
• ID -> feat_id • Parent -> parent_ids • Align -> align • Query -> query Error in bind_rows(): ! Can't combine ..1$feat_id and ..2$feat_id . Run rlang::last_trace() to see where the error occurred. Warning message: This looks like a gff2/gtf file. This is usually fine, but given the ambigious definition of this format, it is not guaranteed that gene models are always captured correctly. exons/CDS might not be recognized as belonging to the same gene, etc. Also note: types and attributes are as far as possible converted to match gff3 standards (transcript -> mRNA, 5'/3'UTR -> five/three_prime_UTR, ...)

I tried to convert my gff into gtf using AGAT but I have the same problem. The format of my gff/gtf is the following one:

gtf-version X

GFF-like GTF i.e. not checked against any GTF specification. Conversion based on GFF input, standardised by AGAT.

source-version exonerate:protein2genome:local 2.4.0

BnS exonerate:protein2genome:local gene 214068 243275 1956 + . gene_id "1"; ID "agat-gene-8"; gene_orientation "+"; identity "100.00"; sequence "URC25299.1"; similarity "100.00"; BnS AGAT mRNA 214068 243275 . + . gene_id "1"; transcript_id "agat-rna-10"; ID "agat-rna-10"; Parent "agat-gene-8";

Is there something I am doing wrong?

Thank you in advance!

thackl commented 7 months ago

Could you attach your gtf/gff as a file (paperclip icon in the toolbar above the text box)? Thanks!

m-bogaerts commented 7 months ago

Thank very much for your answer. Please, find attached the gtf file.

dtol_habn_hab4_hab2_alpha_protein.gtf.txt

thackl commented 7 months ago

Try it now. And let me know if there are still problems

m-bogaerts commented 7 months ago

Thank you very much, it now works and recognize all the annotation types: Harmonizing attribute names
• ID -> feat_id • Parent -> parent_ids • Align -> align • Query -> query Features read

A tibble: 8 × 3

source type n

1 AGAT mRNA 27 2 exonerate:protein2genome:local CDS 27 3 exonerate:protein2genome:local exon 27 4 exonerate:protein2genome:local gene 11 5 exonerate:protein2genome:local intron 16 6 exonerate:protein2genome:local similarity 11 7 exonerate:protein2genome:local splice3 16 8 exonerate:protein2genome:local splice5 16 However, when plotting them, it only plots few mRNA and CDS (see picture attached). Since I am interested in plotting exons particularly, is something I can do for that? ![gggenomes_type](https://github.com/thackl/gggenomes/assets/25815798/97903424-075f-4fba-b669-3b9c5e064deb) Thank you very much in advance.
thackl commented 7 months ago

Something like this?

# show everything first
g0 <- read_feats("issue_177/dtol_habn_hab4_hab2_alpha_protein.gtf")

gggenomes(g0) +
  geom_gene() +  # geom_gene specifically parses mRNA, CDS and introns from gene track
  geom_seq() +
  geom_feat(aes(color=type), data=feats(genes)) # geom_feat generically plots any feature from any track

image

# focus on features of interest
gggenomes(g0) |> 
  focus() + # zoom in on regions with actual feature (ignore intergeneic space)
  geom_gene() +
  geom_seq() +
  geom_feat(aes(color=type), 
            data=feats(genes, type %in% c("exon", "intron")), # filter of features of interest
            position = position_nudge(y=.2)) # move them up in the plot

image

m-bogaerts commented 7 months ago

That's exactly what I was looking for. Thank you very much for your answer!