Open sunilicrsat opened 1 year ago
Unable to help without more details. There are many many issues related to "why are most of my genes named hypothetical?" so please look through the issues to understand why that is.
But I think this is likely due to what you passed to --rename
here, ie here's the help menu -- rename should be a new locus_tag base name, certainly not a file.....
$ funannotate annotate
Usage: funannotate annotate <arguments>
version: 1.8.14
Description: Script functionally annotates the results from funannotate predict. It pulls
annotation from PFAM, InterPro, EggNog, UniProtKB, MEROPS, CAZyme, and GO ontology.
Required:
-i, --input Folder from funannotate predict
or
--genbank Genome in GenBank format
-o, --out Output folder for results
or
--gff Genome GFF3 annotation file
--fasta Genome in multi-fasta format
-s, --species Species name, use quotes for binomial, e.g. "Aspergillus fumigatus"
-o, --out Output folder for results
Optional:
--sbt NCBI submission template file. (Recommended)
-a, --annotations Custom annotations (3 column tsv file)
-m, --mito-pass-thru Mitochondrial genome/contigs. append with :mcode
--eggnog Eggnog-mapper annotations file (if NOT installed)
--antismash antiSMASH secondary metabolism results (GBK file from output)
--iprscan InterProScan5 XML file
--phobius Phobius pre-computed results (if phobius NOT installed)
--signalp SignalP pre-computed results (-org euk -format short)
--isolate Isolate name
--strain Strain name
--rename Rename GFF gene models with locus_tag from NCBI.
--fix Gene/Product names fixed (TSV: GeneID Name Product)
--remove Gene/Product names to remove (TSV: Gene Product)
--busco_db BUSCO models. Default: dikarya
-t, --tbl2asn Additional parameters for tbl2asn. Default: "-l paired-ends"
-d, --database Path to funannotate database. Default: $FUNANNOTATE_DB
--force Force over-write of output folder
--cpus Number of CPUs to use. Default: 2
--tmpdir Volume/location to write temporary files. Default: /tmp
--p2g protein2genome pre-computed results
--header_length Maximum length of FASTA headers. Default: 16
--no-progress Do not print progress to stdout for long sub jobs
Thank you Jon, The gene names have T-1 suffix. Is that making hypothetical proteins? Can I share gene prediction out put? in gff3 of gene prediction also I have a hypothetical protein.
Can you post log files from your predict and annotate runs so I can see what has been run, etc? It would then also help to be able to see what the gene models look like, I don't need the entire GFF/TBL but some snippets so I can see what the naming scheme is and what other annotations have been incorporated.
Which files you need from the following predict output,
1_1.cds-transcripts.fa
1_1.proteins.fa.ndb
1_1.proteins.fa.ntf
1_1.proteins.fa.pot
1_1.validation.txt
1_1.discrepency.report.txt
1_1.proteins.fa.nhr
1_1.proteins.fa.nto
1_1.proteins.fa.psq
1_1.error.summary.txt
1_1.proteins.fa.nin
1_1.proteins.fa.pdb
1_1.proteins.fa.ptf
aspergillus_oryzae.parameters.json
1_1.gbk
1_1.proteins.fa.nog
1_1.proteins.fa.phr
1_1.proteins.fa.pto
1_1.gff3
1_1.proteins.fa.nos
1_1.proteins.fa.pin
1_1.scaffolds.fa
1_1.mrna-transcripts.fa
1_1.proteins.fa.not
1_1.proteins.fa.pog
1_1.stats.json
1_1.proteins.fa
1_1.proteins.fa.nsq
1_1.proteins.fa.pos
1_1.tbl
Hi Jon, I have sent link for the predict output .gff file on nextgen.usfs@gmail.com
Dear team I ran following command, funannotate annotate -i ../10mAF_1/predict_results/ -o ../10mAF_fun/ --rename 10mAF_rename_ncbi.txt --annotations 10mAF_annotations.txt --eggnog 10mAF_eggnog.txt I am getting all hypothetical proteins in the annotation.txt file.