nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
302 stars 82 forks source link

adding custom annotations: ncRNA from Infernal/Rfam #711

Open xvazquezc opened 2 years ago

xvazquezc commented 2 years ago

Hi there,

I'm annotating a few genomes and I realised that funannotate does not have any integrated tool for annotating ncRNA other then tRNAscan-SE. I will be annotating ncRNA through Infernal, as per the standard ncRNA annotation with Rfam.

I've only seen a brief reference to adding other annotation on #481, where it is referred to manual manipulation of some .tbl files (which are not necessarily a nice format). I've seen that funannotate annotate has the --annotations option but it's not clearly documented what'd be the format or if it'd be valid for genes not yet called.

Any details that anyone could provide would be appreciated.

Cheers,

hyphaltip commented 2 years ago

Are you wanting these as genes in genbank predicted files - infernal predictions can be pretty slow and am not how universally generic a prediction pipeline can be for these as it will depend on the db. Are you wanting to do this for an animal, plant, or fungus?

hyphaltip commented 2 years ago

But I think you can import more gene features as a gff file you provide I think but not sure about how it works for noncoding gene features.

nextgenusfs commented 2 years ago

If you can provide some general GFF files from tools to annotate these I can look at what it would take to have funannotate pass them through.

xvazquezc commented 2 years ago

I'm running Infernal on fungal genomes, mostly following the Rfam recommendations, i.e. using -Z to calculate e-values, only using CM (--nohmmonly), and using the gathering threshold (--cut_ga, curated thresholds). The only parameter I change is using the more strict --default instead of --rfam at a cost of running time. e.g.

cmscan -Z ${ZVAL} --cpu $NCPUS \
--cut_ga --default --nohmmonly --fmt 2 \
--tblout rfam.tblout \
--clanin ${RFAMDB}/Rfam.clanin \
${RFAMDB}/Rfam.cm $GENOME > rfam.cmscan

So far, the only clear "errors" occur with rRNA genes which can be called as archaeal instead of eukaryotic or bacterial (mito) or the other way around. Nonetheless, rRNA-specific tools are recommended and e.g. barrnap generates gff3 files by default (that could be another nice addition btw)

Regarding Infernal's output, there is no gff option, but the way is run as recommended generates a tabular format output file (fixed width table). Here there is an example.

hyphaltip commented 2 years ago

@nextgenusfs I can give this a try.

alephreish commented 1 year ago

For conversion to gff, see jiffy-infernal-hmmer-scripts, in particular infernal-tblout2gff.pl