nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
320 stars 85 forks source link

UnboundLocalError: local variable 'Transcript' referenced before assignment in funannotate annotate #909

Closed josieparis closed 1 year ago

josieparis commented 1 year ago

Hi guys! Sorry to trouble you with this, as it's probably a formatting error, but we can't seem to get passed an issue with funannotate annotate

We are using version funannotate v1.8.15

As input, we have the standard gff3 output from augustus (augustus.hints.gff3) and we are trying to integrate information from emapper (standard .annotations output) and interproscan (standard xml) results.

The error is:

Index Error retriving transcript 0: (g1, {'name': None, 'type': 'transcript', 'transcript': [], 'cds_transcript': [], 'protein': [], '5UTR': [[]], '3UTR': [[]], 'gene_synonym': [], 'codon_
start': [[]], 'ids': ['g1.t1'], 'CDS': [[(36348, 36521), (36603, 37070), (37356, 37577), (38081, 38107)]], 'mRNA': [[(36348, 36521), (36603, 37070), (37356, 37577), (38081, 38107)]], 'stra
nd': '+', 'EC_number': [[]], 'location': (36348, 38107), 'contig': 'contig_100', 'product': ['hypothetical protein'], 'source': 'AUGUSTUS', 'phase': [[0, 0, 0, 0]], 'db_xref': [[]], 'go_te
rms': [[]], 'note': [[]], 'partialStart': [False], 'partialStop': [False], 'pseudo': False})
-------------------------------------------------------
Traceback (most recent call last):
  File "xxx/funannotate/bin/funannotate", line 10, in <module>

   sys.exit(main()

 File "xxx/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main
  mod.main(arguments)
File "xxx/funannotate/lib/python3.8/site-packages/funannotate/annotate.py", line 698, in main
 GeneCounts, GeneDB = lib.convertgff2tbl(
File "xxx/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 2382, in convertgff2tbl
tranout.write(">%s %s\n%s\n" % (x, k, softwrap(Transcript)))
UnboundLocalError: local variable 'Transcript' referenced before assignment

The gff3 file is in standard format, here's the head:

##gff-version 3
contig_7    AUGUSTUS    gene    1   11876   0.08    +   .   ID=g29504
contig_7    AUGUSTUS    transcript  1   11876   0.08    +   .   ID=g29504.t1;Parent=g29504
contig_7    AUGUSTUS    exon    1   661 .   +   .   ID=exon-196740;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    exon    971 1948    .   +   .   ID=exon-196741;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    exon    3695    3939    .   +   .   ID=exon-196742;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    exon    11533   11876   .   +   .   ID=exon-196743;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    CDS 553 661 0.6 +   2   ID=cds-196740;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    CDS 971 1948    0.16    +   1   ID=cds-196741;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    CDS 3695    3939    0.17    +   1   ID=cds-196742;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    CDS 11533   11876   0.77    +   2   ID=cds-196743;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    five_prime_UTR  1   552 .   +   .   ID=nbis-five_prime_utr-492;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    intron  1   552 0.6 +   .   ID=intron-167045;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    intron  662 970 0.97    +   .   ID=intron-167046;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    intron  1949    3694    0.16    +   .   ID=intron-167047;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    intron  3940    11532   0.92    +   .   ID=intron-167048;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    stop_codon  11874   11876   .   +   0   ID=stop_codon-30379;Parent=g29504.t1;gene_id=g29504;transcript_id=g29504.t1
contig_7    AUGUSTUS    gene    26083   26776   0.33    +   .   ID=g29505
contig_7    AUGUSTUS    transcript  26083   26776   0.33    +   .   ID=g29505.t1;Parent=g29505
contig_7    AUGUSTUS    exon    26083   26192   .   +   .   ID=exon-196744;Parent=g29505.t1;gene_id=g29505;transcript_id=g29505.t1
contig_7    AUGUSTUS    exon    26482   26776   .   +   .   ID=exon-196745;Parent=g29505.t1;gene_id=g29505;transcript_id=g29505.t1
contig_7    AUGUSTUS    CDS 26083   26192   0.33    +   0   ID=cds-196744;Parent=g29505.t1;gene_id=g29505;transcript_id=g29505.t1
contig_7    AUGUSTUS    CDS 26482   26776   1   +   1   ID=cds-196745;Parent=g29505.t1;gene_id=g29505;transcript_id=g29505.t1
contig_7    AUGUSTUS    intron  26193   26481   1   +   .   ID=intron-167049;Parent=g29505.t1;gene_id=g29505;transcript_id=g29505.t1
contig_7    AUGUSTUS    start_codon 26083   26085   .   +   0   ID=start_codon-30122;Parent=g29505.t1;gene_id=g29505;transcript_id=g29505.t1
contig_7    AUGUSTUS    stop_codon  26774   26776   .   +   0   ID=stop_codon-30380;Parent=g29505.t1;gene_id=g29505;transcript_id=g29505.t1

We tested funannotate annotate with the test data (using the .gbk) file and it works fine. Obviously it is an issue with non-standard formatting of the gff3 file from augustus, but can't figure out what. We've tried reordering the gff3 file but no luck.

Any help greatly appreciated! Thanks!

nextgenusfs commented 1 year ago

so it seems to be dying on g1.t1 gene, what do those annotations look like?

So I know parsing GFF3 is a nightmare from a variety of tools. The code that does this is nested in funannotate and was intended to parse funannotate GFF3 output so its not really a general solution. Prepping for a funannotate2 I re-wrote all the GFF3 parsing and conversations into a new tool which will be a dependency, so you can try gfftk here https://github.com/nextgenusfs/gfftk which you should be able to install with pip, ie python -m pip install gfftk.

image

So you might try gfftk sanitize to see if that will clean up the format into something funannotate can understand. image

nextgenusfs commented 1 year ago

Also, of course if you just let funannotate run Augustus and the other ab initio gene callers in funannotate predict you won't have any problems.....

josieparis commented 1 year ago

Jon, you star. Thank you so much for such a quick response. I'll try as you suggested!

In the meantime, not sure if this will help, but g1 looks like:

contig_100  AUGUSTUS    gene    36348   38107   0.22    +   .   ID=g1
contig_100  AUGUSTUS    transcript  36348   38107   0.22    +   .   ID=g1.t1;Parent=g1
contig_100  AUGUSTUS    exon    36348   36521   .   +   .   ID=exon-1;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    exon    36603   37070   .   +   .   ID=exon-2;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    exon    37356   37577   .   +   .   ID=exon-3;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    exon    38081   38107   .   +   .   ID=exon-4;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    CDS 36348   36521   0.98    +   0   ID=cds-1;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    CDS 36603   37070   0.99    +   0   ID=cds-2;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    CDS 37356   37577   0.59    +   0   ID=cds-3;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    CDS 38081   38107   0.23    +   0   ID=cds-4;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    intron  36522   36602   1   +   .   ID=intron-1;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    intron  37071   37355   1   +   .   ID=intron-2;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    intron  37578   38080   0.23    +   .   ID=intron-3;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    start_codon 36348   36350   .   +   0   ID=start_codon-1;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
contig_100  AUGUSTUS    stop_codon  38105   38107   .   +   0   ID=stop_codon-1;Parent=g1.t1;gene_id=g1;transcript_id=g1.t1
josieparis commented 1 year ago

Also agree to maybe pop back a step and just run augustus within funannotate. Thanks for all your hard work on such a great tool!

josieparis commented 1 year ago

gfftk sanitize worked wonders! Funannotate annotate ran successfully! Thank you, will close the issue :)