nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
301 stars 82 forks source link

Results empty funannotate annotate #878

Open MesYosra opened 1 year ago

MesYosra commented 1 year ago

Hello, I ran funannotate annotate with the command :

funannotate annotate --gff "/shared/home/ymestiri/projects/annotation_fusarium/Funannotate/UK0001Mo.gff3" --fasta "/shared/home/ymestiri/projects/annotation_fusarium/Funannotate/UK0001. fasta" --header_length "10000" -d " /shared/home/ymestiri/projects/annotation_fusarium/Funannotate/FUNANNOTATE_DB" --species "Fusarium oxysporum " --cpus 10 -o "/shared/home/ymestiri/projects/annotation_fusarium/Funannotate/annoted/".

I got these results : [Mar 09 01:32 PM]: OS: CentOS Linux 7, 56 cores, ~ 264 GB RAM. Python: 3.8.12 [Mar 09 01:32 PM]: Running 1.8.9 [Mar 09 01:32 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Mar 09 01:32 PM]: Parsing annotation and preparing annotation files. [Mar 09 01:32 PM]: Found 16,277 gene models from GFF3 annotation [Mar 09 01:33 PM]: Adding Functional Annotation to Fusarium oxysporum , NCBI accession: None [Mar 09 01:33 PM]: Annotation consists of: 16,277 gene models [Mar 09 01:33 PM]: 16,277 protein records loaded [Mar 09 01:33 PM]: Running HMMer search of PFAM version 35.0 [Mar 09 01:41 PM]: 18,643 annotations added [Mar 09 01:41 PM]: Running Diamond blastp search of UniProt DB version 2022_05 [Mar 09 01:42 PM]: 1,005 valid gene/product annotations from 1,431 total [Mar 09 01:42 PM]: Install eggnog-mapper or use webserver to improve functional annotation: https://github.com/jhcepas/eggnog-mapper [Mar 09 01:42 PM]: No Eggnog-mapper results found. [Mar 09 01:42 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.86 [Mar 09 01:42 PM]: 1,005 gene name and product description annotations added [Mar 09 01:42 PM]: Running Diamond blastp search of MEROPS version 12.0 [Mar 09 01:42 PM]: 495 annotations added [Mar 09 01:42 PM]: Annotating CAZYmes using HMMer search of dbCAN version 11.0 [Mar 09 01:44 PM]: 690 annotations added [Mar 09 01:44 PM]: Annotating proteins with BUSCO dikarya models [Mar 09 01:45 PM]: 1,310 annotations added [Mar 09 01:45 PM]: Skipping phobius predictions, try funannotate remote -m phobius [Mar 09 01:45 PM]: Skipping secretome: neither SignalP nor Phobius searches were run [Mar 09 01:45 PM]: 0 secretome and 0 transmembane annotations added [Mar 09 01:45 PM]: Parsing InterProScan5 XML file [Mar 09 01:45 PM]: Found 2,803 duplicated annotations, adding 23,148 valid annotations [Mar 09 01:45 PM]: Converting to final Genbank format, good luck! [Mar 09 01:45 PM]: Creating AGP file and corresponding contigs file [Mar 09 01:45 PM]: Writing genome annotation table. [Mar 09 01:45 PM]: Funannotate annotate has completed successfully!

But my .gff3 is empty and my .gbk doesn't contain any functionnal annotation, I don't understand why.

If you need any information or file, I will gladly help you as much as I can.

Thank you for your attention and your help.

Cyaneis commented 1 year ago

Hello,

I'm having the same kind of problem.

I have two different "kind' of .gff files. I have a single one which have IDs for every sequence in the 9th column (attributes) and the other ones don't, which is a problem for funannotate annotate, so I tried to fix that by generating ID myself. I'm pretty sure that this is where the problem lies.

Basically my modified gffs (with manually generated IDs) looks like this : contig_1 AUGUSTUS start_codon 197229 197231 . + 0 Parent=921dae5a-bf3e-11ed-a1fc-1866da936989;ID=921daee6-bf3e-11ed-a1fc-1866da936989 contig_1 AUGUSTUS intron 197876 197943 0.57 + . Parent=921dae5a-bf3e-11ed-a1fc-1866da936989;ID=921daf2c-bf3e-11ed-a1fc-1866da936989 contig_1 AUGUSTUS CDS 197229 197875 1.0 + 0 Parent=921dae5a-bf3e-11ed-a1fc-1866da936989;ID=921daf72-bf3e-11ed-a1fc-1866da936989 contig_1 AUGUSTUS CDS 197944 198022 0.57 + 1 Parent=921dae5a-bf3e-11ed-a1fc-1866da936989;ID=921dafae-bf3e-11ed-a1fc-1866da936989 contig_1 AUGUSTUS stop_codon 198020 198022 . + 0 Parent=921dae5a-bf3e-11ed-a1fc-1866da936989;ID=921dafea-bf3e-11ed-a1fc-1866da936989

contig_1 AUGUSTUS gene 274080 274456 0.89 + . ID=921db01c-bf3e-11ed-a1fc-1866da936989;Name=Pf_GLP0701_FLYE_STEP_CORRECTION_MEDAKA_107280 contig_1 AUGUSTUS mRNA 274080 274456 0.89 + . ID=921db058-bf3e-11ed-a1fc-1866da936989;Parent=921db01c-bf3e-11ed-a1fc-1866da936989

The ID is either before or after the Parent attribute. I don't know how Funannotate uses the IDs, so I don't know if this can be a problem or not. Maybe it's the way the script I used generated IDs that is the problem ?

When checking the logs, I can see a message showing how annotate found some annotations with a database or another, but the files are still empty anyway..

I hope you can help me.

Thank you for your attention, Have a nice day.

nextgenusfs commented 1 year ago

Most of the time these errors are related to gene identifiers (ID=) that are parsed incorrectly. The pipeline is setup to use NCBI like locus_tags, ie VC83_000001 or FUN_000001 -- where it is expecting some locus tag separated from the numerical id with a single underscore. Identifiers with multiple underscores can be problematic. I'm not sure about the sha256 strings @Cyaneis.

Cyaneis commented 1 year ago

Thank you for your answer @nextgenusfs ! I will try to give the correct ID format to my sequences with a python script. Shouldn't be too complex.

Before that I wanted to try with a really small example.gff3 : contig_10 BRAKER gene 3792098 3794795 . + . ID=gene_1;Name=gene_1 contig_10 BRAKER mRNA 3792098 3794795 . + . ID=mRNA_1;Parent=gene_1 contig_10 BRAKER start_codon 3792098 3792100 . + 0 ID=start_1;Parent=mRNA_1 contig_10 BRAKER CDS 3792098 3792141 0.02 + 0 ID=cds_1;Parent=mRNA_1 contig_10 BRAKER CDS 3794738 3794795 0.03 + 1 ID=cds_2;Parent=mRNA_1 contig_10 BRAKER stop_codon 3794793 3794795 . + 0 ID=stop_1;Parent=mRNA_1 Here, I tried to give them correct IDs, Names and Parents.

I get a weird error message that I never got before :

[Mar 17 10:54 AM]: OS: CentOS Linux 7, 32 cores, ~ 132 GB RAM. Python: 3.9.13 [Mar 17 10:54 AM]: Running 1.8.13 [Mar 17 10:54 AM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Mar 17 10:54 AM]: Found existing output directory /shared/ifbstor1/projects/annotation_fijiensism2/vendloczki/Funannotate/Seq/annotedRef/. Warning, will re-use any intermediate files found. [Mar 17 10:54 AM]: Parsing annotation and preparing annotation files. [Mar 17 10:54 AM]: Found 13,107 gene models from GFF3 annotation [Mar 17 10:55 AM]: Adding Functional Annotation to Pseudocercospora fijiensis, NCBI accession: None [Mar 17 10:55 AM]: Annotation consists of: 13,107 gene models [Mar 17 10:55 AM]: 13,107 protein records loaded [Mar 17 10:55 AM]: Existing Pfam-A results found: /shared/ifbstor1/projects/annotation_fijiensism2/vendloczki/Funannotate/Seq/annotedRef/annotate_misc/annotations.pfam.txt [Mar 17 10:55 AM]: 11,987 annotations added [Mar 17 10:55 AM]: Running Diamond blastp search of UniProt DB version 2022_05 [Mar 17 10:55 AM]: 659 valid gene/product annotations from 978 total [Mar 17 10:55 AM]: Running Eggnog-mapper Traceback (most recent call last): File "/shared/home/pvendloczki/.local/bin/funannotate", line 8, in sys.exit(main()) File "/shared/home/pvendloczki/.local/lib/python3.9/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/shared/home/pvendloczki/.local/lib/python3.9/site-packages/funannotate/annotate.py", line 804, in main if parse_version(get_emapper_version()) >= parse_version('2.1.0'): File "/shared/ifbstor1/software/miniconda/envs/python-pytorch-tensorflow-3.9-1.11.0-2.6.2/lib/python3.9/site-packages/pkg_resources/init.py", line 121, in parse_version return packaging.version.Version(v) File "/shared/ifbstor1/software/miniconda/envs/python-pytorch-tensorflow-3.9-1.11.0-2.6.2/lib/python3.9/site-packages/pkg_resources/_vendor/packaging/version.py", line 264, in init match = self._regex.search(version) TypeError: expected string or bytes-like object

I tried with another .gff3 (for which I already got results with funnanotate annotate, so I knew that there can't be any problem with the file itself) and got the same error. I didn't touch nor edit any of the files in the error ..

I tried force reinstalling funnanotate latest version and reinstalled Eggnogmapper and Diamond, but it doesn't change anything. I know it's not because of the .gff file but I can't see why I get this error when everything was fine just yesterday ..

If you have any idea, I would really appreciate it. Have a nice day !