nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

Empty functional annotation after funannotate functional #1001

Open pcampiteli opened 8 months ago

pcampiteli commented 8 months ago

Funannotate version Funannotate galaxy australia V 1.8.15 galaxy4

Describe the bug I'm promoting a functional reannotation using public gff files from NCBI. For this process I've conducted a funannotate functional without Egg nog, interprocan and Antismash to rename gene models and normalize the GFF file. This step works fine. Then I used the output protein and gff files to conduct Interproscan (protein) eggnog (protein) antismash 6 (GFF + fasta on their website). The functional re-annotation finishes the job without problem, but the stats file does no recognize the functional annotation and the Annotation keys are all 0. It seems they read the files but don't integrate the functional information to my gene models. Is there something I can do to prevent it and execute the functional re-annotation properly?

What command did you issue? Extracted from the galaxy tool standard command line Copy/paste the command used. "export FUNANNOTATE_DB='/mnt/custom-indices/funannotate/2023-05-10-062530' && funannotate annotate --gff '/mnt/user-data-volA/data11/3/8/c/dataset_38c46e19-bb21-4f39-aceb-ddd8b8491e02.dat' --fasta '/mnt/user-data-volA/data11/1/1/b/dataset_11b5bbde-58a9-439d-a7a5-f8b350aae40b.dat' --species 'Trichoderma ghuizhouense' --out output --database '/mnt/custom-indices/funannotate/2023-05-10-062530' --eggnog '/mnt/user-data-volA/data11/5/8/3/dataset_58367cdb-ea71-41f9-814f-b4036f3c3aef.dat' --antismash '/mnt/user-data-volA/data11/9/5/9/dataset_959b4fc5-63aa-4c28-a243-37beee13481f.dat' --iprscan '/mnt/user-data-volA/data11/8/9/2/dataset_892d922f-06bc-4c61-9534-29e3ec836f2a.dat' --busco_db 'ascomycota_odb10' --isolate '' --strain 'NJAU 4742' --header_length 16 --cpus ${GALAXY_SLOTS:-2} && find output/annotateresults -regex ".*part[0-9]+.(sqn|tbl)$" -delete && mv output/annotate_results/.gbk out.gbk && mv output/annotate_results/.annotations.txt out.annotations.txt && mv output/annotate_results/.contigs.fsa out.contigs.fsa && mv output/annotate_results/.agp out.agp && mv output/annotate_results/.tbl out.tbl && mv output/annotate_results/.sqn out.sqn && mv output/annotate_results/.scaffolds.fa out.scaffolds.fa && mv output/annotate_results/.proteins.fa out.proteins.fa && mv output/annotate_results/.mrna-transcripts.fa out.mrna-transcripts.fa && mv output/annotate_results/.cds-transcripts.fa out.cds-transcripts.fa && mv output/annotate_results/.gff3 out.gff3 && mv output/annotate_results/.discrepency.report.txt out.discrepency.report.txt && mv output/annotate_results/*.stats.json out.stats.json"

Logfiles the logfiles extracted from galaxy [Feb 19 01:31 PM]: OS: Ubuntu 22.04, 32 cores, ~ 132 GB RAM. Python: 3.8.15 [Feb 19 01:31 PM]: Running 1.8.15 [Feb 19 01:31 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Feb 19 01:31 PM]: Parsing annotation and preparing annotation files. [Feb 19 01:31 PM]: Found 11,255 gene models from GFF3 annotation [Feb 19 01:32 PM]: Adding Functional Annotation to Trichoderma ghuizhouense, NCBI accession: None [Feb 19 01:32 PM]: Annotation consists of: 11,255 gene models [Feb 19 01:32 PM]: 11,255 protein records loaded [Feb 19 01:32 PM]: Running HMMer search of PFAM version 35.0 [Feb 19 02:17 PM]: 14,424 annotations added [Feb 19 02:17 PM]: Running Diamond blastp search of UniProt DB version 2023_02 [Feb 19 02:18 PM]: 956 valid gene/product annotations from 1,309 total [Feb 19 02:18 PM]: Existing Eggnog-mapper results found: output/annotate_misc/eggnog.emapper.annotations [Feb 19 02:18 PM]: Parsing EggNog Annotations [Feb 19 02:18 PM]: EggNog version parsed as 2.1.8 [Feb 19 02:19 PM]: 22,152 COG and EggNog annotations added [Feb 19 02:19 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.88 [Feb 19 02:19 PM]: 2,825 gene name and product description annotations added [Feb 19 02:19 PM]: Running Diamond blastp search of MEROPS version 12.0 [Feb 19 02:19 PM]: 450 annotations added [Feb 19 02:19 PM]: Annotating CAZYmes using HMMer search of dbCAN version 11.0 [Feb 19 02:20 PM]: 450 annotations added [Feb 19 02:20 PM]: Annotating proteins with BUSCO ascomycota_odb10 models [Feb 19 03:16 PM]: 1,701 annotations added [Feb 19 03:16 PM]: Skipping phobius predictions, try funannotate remote -m phobius [Feb 19 03:16 PM]: Skipping secretome: neither SignalP nor Phobius searches were run [Feb 19 03:16 PM]: 0 secretome and 0 transmembane annotations added [Feb 19 03:20 PM]: Parsing InterProScan5 XML file [Feb 19 03:23 PM]: Now parsing antiSMASH v7 results, finding SM clusters [Feb 19 03:23 PM]: Found 62 clusters, 0 biosynthetic enyzmes, and 0 smCOGs predicted by antiSMASH [Feb 19 03:23 PM]: Found 0 duplicated annotations, adding 271,825 valid annotations [Feb 19 03:23 PM]: Converting to final Genbank format, good luck! [Feb 19 03:25 PM]: Creating AGP file and corresponding contigs file [Feb 19 03:25 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4 [Feb 19 03:25 PM]: Creating tab-delimited SM cluster output [Feb 19 03:26 PM]: Writing genome annotation table. [Feb 19 03:26 PM]: Funannotate annotate has completed successfully! We need YOUR help to improve gene names/product descriptions: 0 gene/products names MUST be fixed, see output/annotate_results/Gene2Products.must-fix.txt 0 gene/product names need to be curated, see output/annotate_results/Gene2Products.need-curating.txt 6 gene/product names passed but are not in Database, see output/annotate_results/Gene2Products.new-names-passed.txt Please consider contributing a PR at https://github.com/nextgenusfs/gene2product

OS/Install Information

hyphaltip commented 8 months ago

its really hard to help without a way to reproduce your problem, knowing what is in the annotate_misc folder would help. seeing if the .tbl file has no annotations or just the gbk file? if you can provide copies of the data files it will make it possible to reproduce.

pcampiteli commented 8 months ago

Thank you for reaching out. Currently, I am utilizing the Galaxy platform version for this task so I don't have access to the annotate_misc folder. Unfortunately, this limitation is complicating my debugging process.

Despite having populated my .gbk and .tbl files with gene structure, I'm facing a challenge where the outputs from InterProScan, EggNOG, and AntiSMASH, calculated separately and used as input, are not being recognized in the GFF file. I suspect that the naming conventions employed by the NCBI might be the root cause of this problem. Even after attempting a preliminary step to change the gene models and conduct functional annotation with simplified, renamed gene models, the issue persists.

Thank you for your assistance in resolving this matter. Is there any additional information I could provide so you can help me? I'm also trying to install the funannotate in the lab server but the conda installation is endlessly solving the environment. If it finishes I'll make the annotation there so I can access the misc folder.

pcampiteli commented 8 months ago

GCA0020227851TguiNJAU4742_final_issue.rocrate.zip https://drive.google.com/file/d/1Uf40iZzNaDyAaK4ituAeurEJU5fcwkNe/view?usp=drive_web Hello here is the file used as input to the annotation so you can reproduce it. I'm working on multiple strains.

Just for information, I extracted the files from NCBI, then used funannotate functional preliminary just to change the gene models. Then eggnog, interpro and antismash with the protein sequences. With the GFF3 and scaffolds outputs from the first funannotate functional and the functional files I've conducted the funannotate functional again, in the end the job finishes but as I said they aren't integrated to the annotation. If you look into the .tbl fine, you can see that the gene2product works, but the functional data is not there. Also I've tested to annotate without changing with other strain, so i'll send the files too so you may compare the results i don't know

Thanks for you time and help

GCA0110663451TlenCFAM422_issue.rocrate.zip https://drive.google.com/file/d/1OushJ6XErd4uTTKmFTqtsjuCqel_nrWP/view?usp=drive_web

Em qui., 22 de fev. de 2024 às 04:36, Jason Stajich < @.***> escreveu:

its really hard to help without a way to reproduce your problem, knowing what is in the annotate_misc folder would help. seeing if the .tbl file has no annotations or just the gbk file? if you can provide copies of the data files it will make it possible to reproduce.

— Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/1001#issuecomment-1958863082, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZLZ65V3FNM3V6KXLJGRMXTYU3YONAVCNFSM6AAAAABDPWU2ZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJYHA3DGMBYGI . You are receiving this because you authored the thread.Message ID: @.***>

-- Paulo Henrique C. De Azevedo