nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
320 stars 85 forks source link

getting all hypothetical proteins #837

Open sunilicrsat opened 1 year ago

sunilicrsat commented 1 year ago

Dear team I ran following command, funannotate annotate -i ../10mAF_1/predict_results/ -o ../10mAF_fun/ --rename 10mAF_rename_ncbi.txt --annotations 10mAF_annotations.txt --eggnog 10mAF_eggnog.txt I am getting all hypothetical proteins in the annotation.txt file.

nextgenusfs commented 1 year ago

Unable to help without more details. There are many many issues related to "why are most of my genes named hypothetical?" so please look through the issues to understand why that is.

But I think this is likely due to what you passed to --rename here, ie here's the help menu -- rename should be a new locus_tag base name, certainly not a file.....

$ funannotate annotate

Usage:       funannotate annotate <arguments>
version:     1.8.14

Description: Script functionally annotates the results from funannotate predict.  It pulls
             annotation from PFAM, InterPro, EggNog, UniProtKB, MEROPS, CAZyme, and GO ontology.

Required:
  -i, --input          Folder from funannotate predict
    or
  --genbank            Genome in GenBank format
  -o, --out            Output folder for results
    or
  --gff                Genome GFF3 annotation file
  --fasta              Genome in multi-fasta format
  -s, --species        Species name, use quotes for binomial, e.g. "Aspergillus fumigatus"
  -o, --out            Output folder for results

Optional:
  --sbt                NCBI submission template file. (Recommended)
  -a, --annotations    Custom annotations (3 column tsv file)
  -m, --mito-pass-thru Mitochondrial genome/contigs. append with :mcode
  --eggnog             Eggnog-mapper annotations file (if NOT installed)
  --antismash          antiSMASH secondary metabolism results (GBK file from output)
  --iprscan            InterProScan5 XML file
  --phobius            Phobius pre-computed results (if phobius NOT installed)
  --signalp            SignalP pre-computed results (-org euk -format short)
  --isolate            Isolate name
  --strain             Strain name
  --rename             Rename GFF gene models with locus_tag from NCBI.
  --fix                Gene/Product names fixed (TSV: GeneID    Name    Product)
  --remove             Gene/Product names to remove (TSV: Gene  Product)
  --busco_db           BUSCO models. Default: dikarya
  -t, --tbl2asn        Additional parameters for tbl2asn. Default: "-l paired-ends"
  -d, --database       Path to funannotate database. Default: $FUNANNOTATE_DB
  --force              Force over-write of output folder
  --cpus               Number of CPUs to use. Default: 2
  --tmpdir             Volume/location to write temporary files. Default: /tmp
  --p2g                protein2genome pre-computed results
  --header_length      Maximum length of FASTA headers. Default: 16
  --no-progress        Do not print progress to stdout for long sub jobs
sunilicrsat commented 1 year ago

Thank you Jon, The gene names have T-1 suffix. Is that making hypothetical proteins? Can I share gene prediction out put? in gff3 of gene prediction also I have a hypothetical protein.

nextgenusfs commented 1 year ago

Can you post log files from your predict and annotate runs so I can see what has been run, etc? It would then also help to be able to see what the gene models look like, I don't need the entire GFF/TBL but some snippets so I can see what the naming scheme is and what other annotations have been incorporated.

sunilicrsat commented 1 year ago

Which files you need from the following predict output, 1_1.cds-transcripts.fa 1_1.proteins.fa.ndb
1_1.proteins.fa.ntf
1_1.proteins.fa.pot
1_1.validation.txt 1_1.discrepency.report.txt
1_1.proteins.fa.nhr
1_1.proteins.fa.nto
1_1.proteins.fa.psq
1_1.error.summary.txt
1_1.proteins.fa.nin
1_1.proteins.fa.pdb
1_1.proteins.fa.ptf
aspergillus_oryzae.parameters.json 1_1.gbk
1_1.proteins.fa.nog
1_1.proteins.fa.phr
1_1.proteins.fa.pto
1_1.gff3
1_1.proteins.fa.nos
1_1.proteins.fa.pin
1_1.scaffolds.fa
1_1.mrna-transcripts.fa
1_1.proteins.fa.not
1_1.proteins.fa.pog
1_1.stats.json 1_1.proteins.fa
1_1.proteins.fa.nsq
1_1.proteins.fa.pos
1_1.tbl

sunilicrsat commented 1 year ago

Hi Jon, I have sent link for the predict output .gff file on nextgen.usfs@gmail.com