nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
302 stars 82 forks source link

Where to find filtered genes #367

Closed qihualiang closed 3 years ago

qihualiang commented 4 years ago

Hi Jon,

I compared the gff output under predict_results/ with previous annotation results and I found some genes missing from funannotate results. I am able to find such missing genes from predict_misc/evm.cleaned.gff3. According to the annotation steps in the tutorial, I think such genes are filtered (length filtering, spanning gaps, and transposable elements). But I could only find some of these missing genes from predict_misc/bad_models.gff. Where should I check to find all the filtered genes between predict_misc/evm.cleaned.gff3 and predict_results/final.genes.gff?

Thanks Qihua

nextgenusfs commented 4 years ago

The first output of EVM is: predict_misc/evm.round1.gff3. The "cleaned" models are then those that are dropped for various reasons in predict_misc/evm.cleaned.gff3. How models get dropped is controlled by --repeat_filter option -- by default both blast and overlap are enabled.

The results in predict_misc/evm.cleaned.gff3 are then converted into NCBI tbl format and run through tbl2asn to build a GenBank file and several other outputs. The GFF3 file in predict_results is then generated from the NCBI tbl file (genes get renamed during this process, combined with tRNAscan results, and properly formatted).

What sort of gene models are missing between the data in predict_results and those in predict_misc/evm.cleaned.gff3?