Closed qihualiang closed 3 years ago
The first output of EVM is: predict_misc/evm.round1.gff3
. The "cleaned" models are then those that are dropped for various reasons in predict_misc/evm.cleaned.gff3
. How models get dropped is controlled by --repeat_filter
option -- by default both blast
and overlap
are enabled.
The results in predict_misc/evm.cleaned.gff3
are then converted into NCBI tbl format and run through tbl2asn
to build a GenBank file and several other outputs. The GFF3 file in predict_results
is then generated from the NCBI tbl file (genes get renamed during this process, combined with tRNAscan results, and properly formatted).
What sort of gene models are missing between the data in predict_results
and those in predict_misc/evm.cleaned.gff3
?
Hi Jon,
I compared the gff output under predict_results/ with previous annotation results and I found some genes missing from funannotate results. I am able to find such missing genes from predict_misc/evm.cleaned.gff3. According to the annotation steps in the tutorial, I think such genes are filtered (length filtering, spanning gaps, and transposable elements). But I could only find some of these missing genes from predict_misc/bad_models.gff. Where should I check to find all the filtered genes between predict_misc/evm.cleaned.gff3 and predict_results/final.genes.gff?
Thanks Qihua