nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

Question about mitochondria #706

Open Bartusv opened 2 years ago

Bartusv commented 2 years ago

Hi, when using funannotate are mitochondrial contigs supposed to be included or excluded from the input assembly?

Cheers,

Bart

hyphaltip commented 2 years ago

@nextgenusfs worked on a new step which allows you to process them separately if the names of contigs are know - I think he can answer better than me as I haven't quite incorporated that approach into my pipeline.

nextgenusfs commented 2 years ago

Ideally the mitochondrial genome should be removed prior to running funannotate train/predict. I made an option to incorporate it back in funannotate annotate with the -m flag. Because there is a different genetic code, funannotate will not predict any accurate gene models on mitochondrial contigs. Historically, mitochondrial genomes were a separate NCBI submission -- I don't know what the current status is for how NCBI wants people to submit mitochondrial genomes (ie inline/with nuclear genomes or as a separate submission).

$ funannotate annotate

Usage:       funannotate annotate <arguments>
version:     1.8.10

Description: Script functionally annotates the results from funannotate predict.  It pulls
             annotation from PFAM, InterPro, EggNog, UniProtKB, MEROPS, CAZyme, and GO ontology.

Required:
  -i, --input          Folder from funannotate predict
    or
  --genbank            Genome in GenBank format
  -o, --out            Output folder for results
    or
  --gff                Genome GFF3 annotation file
  --fasta              Genome in multi-fasta format
  -s, --species        Species name, use quotes for binomial, e.g. "Aspergillus fumigatus"
  -o, --out            Output folder for results

Optional:
  --sbt                NCBI submission template file. (Recommended)
  -a, --annotations    Custom annotations (3 column tsv file)
  -m, --mito-pass-thru Mitochondrial genome/contigs. append with :mcode
  --eggnog             Eggnog-mapper annotations file (if NOT installed)
  --antismash          antiSMASH secondary metabolism results (GBK file from output)
  --iprscan            InterProScan5 XML file
  --phobius            Phobius pre-computed results (if phobius NOT installed)
  --isolate            Isolate name
  --strain             Strain name
  --rename             Rename GFF gene models with locus_tag from NCBI.
  --fix                Gene/Product names fixed (TSV: GeneID    Name    Product)
  --remove             Gene/Product names to remove (TSV: Gene  Product)
  --busco_db           BUSCO models. Default: dikarya
  -t, --tbl2asn        Additional parameters for tbl2asn. Default: "-l paired-ends"
  -d, --database       Path to funannotate database. Default: $FUNANNOTATE_DB
  --force              Force over-write of output folder
  --cpus               Number of CPUs to use. Default: 2
  --tmpdir             Volume/location to write temporary files. Default: /tmp
  --no-progress        Do not print progress to stdout for long sub jobs
minhtrung1997 commented 1 year ago

Can I ask what is the input file for argument: --mito-pass-thru