Closed josieparis closed 1 year ago
so it seems to be dying on g1.t1
gene, what do those annotations look like?
So I know parsing GFF3 is a nightmare from a variety of tools. The code that does this is nested in funannotate and was intended to parse funannotate GFF3 output so its not really a general solution. Prepping for a funannotate2 I re-wrote all the GFF3 parsing and conversations into a new tool which will be a dependency, so you can try gfftk
here https://github.com/nextgenusfs/gfftk which you should be able to install with pip, ie python -m pip install gfftk
.
So you might try gfftk sanitize
to see if that will clean up the format into something funannotate can understand.
Also, of course if you just let funannotate run Augustus and the other ab initio gene callers in funannotate predict
you won't have any problems.....
Jon, you star. Thank you so much for such a quick response. I'll try as you suggested!
In the meantime, not sure if this will help, but g1 looks like:
contig_100 AUGUSTUS gene 36348 38107 0.22 + . ID=g1
contig_100 AUGUSTUS transcript 36348 38107 0.22 + . ID=g1.t1;Parent=g1
contig_100 AUGUSTUS exon 36348 36521 . + . ID=exon-1;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS exon 36603 37070 . + . ID=exon-2;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS exon 37356 37577 . + . ID=exon-3;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS exon 38081 38107 . + . ID=exon-4;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS CDS 36348 36521 0.98 + 0 ID=cds-1;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS CDS 36603 37070 0.99 + 0 ID=cds-2;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS CDS 37356 37577 0.59 + 0 ID=cds-3;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS CDS 38081 38107 0.23 + 0 ID=cds-4;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS intron 36522 36602 1 + . ID=intron-1;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS intron 37071 37355 1 + . ID=intron-2;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS intron 37578 38080 0.23 + . ID=intron-3;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS start_codon 36348 36350 . + 0 ID=start_codon-1;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100 AUGUSTUS stop_codon 38105 38107 . + 0 ID=stop_codon-1;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
Also agree to maybe pop back a step and just run augustus within funannotate. Thanks for all your hard work on such a great tool!
gfftk sanitize worked wonders! Funannotate annotate ran successfully! Thank you, will close the issue :)
Hi guys! Sorry to trouble you with this, as it's probably a formatting error, but we can't seem to get passed an issue with funannotate annotate
We are using version funannotate v1.8.15
As input, we have the standard gff3 output from augustus (augustus.hints.gff3) and we are trying to integrate information from emapper (standard .annotations output) and interproscan (standard xml) results.
The error is:
The gff3 file is in standard format, here's the head:
We tested funannotate annotate with the test data (using the .gbk) file and it works fine. Obviously it is an issue with non-standard formatting of the gff3 file from augustus, but can't figure out what. We've tried reordering the gff3 file but no luck.
Any help greatly appreciated! Thanks!