Lots of unknowns - Githubissues

oushujun / EDTA

Extensive de-novo TE Annotator

GNU General Public License v3.0

330 stars 72 forks source link

Hi,

EDTA uses structural features to identify intact TEs at the beginning. For example, if a sequence has terminal repeat and satisfied a number of related features, then it's classified as LTR retrotransposons. Then EDTA will try to classify TEs into superfamilies, ie Gypsy and Copia, based on coding features, otherwise will be named LTR/unknown.

If you use the --sensitive 1 option, then RepeatModeler2 will be recruited to identify repetitive sequences that were not reported by the structural module of EDTA. Due to the lack of homology and coding features, most of them are named unknown/unknown. We have lower confidence in these categories so you may want to filter them with more measures, eg. copy number, overlap with genes, etc.

Best, Shujun

oushujun / EDTA

Lots of unknowns #159