nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
319 stars 83 forks source link

Newbie question on masking #732

Closed Eric-CH-Chen closed 2 years ago

Eric-CH-Chen commented 2 years ago

First of all, thank you for the wonderful tool! I just have a newbie question regarding best practices with regard on genome masking.

I am working on fungi annotation and I am using EDTA + RepeatModeler/RepeatMasker (no repbase) for my masking. I noticed that tantan, running in addition or on previously masked fasta, will mask significantly more bases (~2% EDTA/RM to ~8% tantan).

What is the recommended way to proceed? Should I use higher masked genome fasta and proceed with gene training/prediction?

Thanks

hyphaltip commented 2 years ago

You may want to compare what is masked by tantan if you are worried it is over masking protein coding genes. Prob just run through w the higher masked content and see what your gene counts look like. I also see if Busco recovers genes in masked regions as another test.

Eric-CH-Chen commented 2 years ago

Thanks for the fast reply and recommendations! Will try it out and I will also compare the final predicted protein counts.