Closed Eric-CH-Chen closed 2 years ago
You may want to compare what is masked by tantan if you are worried it is over masking protein coding genes. Prob just run through w the higher masked content and see what your gene counts look like. I also see if Busco recovers genes in masked regions as another test.
Thanks for the fast reply and recommendations! Will try it out and I will also compare the final predicted protein counts.
First of all, thank you for the wonderful tool! I just have a newbie question regarding best practices with regard on genome masking.
I am working on fungi annotation and I am using
EDTA
+RepeatModeler/RepeatMasker
(no repbase) for my masking. I noticed thattantan
, running in addition or on previously masked fasta, will mask significantly more bases (~2% EDTA/RM to ~8%tantan
).What is the recommended way to proceed? Should I use higher masked genome fasta and proceed with gene training/prediction?
Thanks