nf-core / genomeannotator

Pipeline for the identification of (coding) gene structures in draft genomes.
https://nf-co.re/genomeannotator
MIT License
16 stars 13 forks source link

TE annotation and soft masking #14

Open mictadlo opened 1 year ago

mictadlo commented 1 year ago

Description of feature

I noticed that you use Dfam as your repeat library. Could you consider to add DNApipeTE and REPET which are tools for de novo annotation and soft masking of transposable elements (TEs) in genome assemblies, similar to The Extensive de novo TE Annotator (EDTA) and RepeatMasker.

DNApipeTE is a pipeline that includes several steps for TE annotation and soft masking, including repeat identification, classification, and masking. DNApipeTE utilizes several other tools, including RepeatModeler, RepeatMasker, and RepeatExplorer, to perform these tasks. The output of DNApipeTE includes a consensus library of repeat sequences, as well as annotations of putative TE locations in the genome, and a soft-masked genome assembly.

REPET is another pipeline that includes several steps for TE annotation and soft masking, including repeat identification, classification, clustering, and masking. REPET utilizes several other tools, including RepeatModeler, RepeatMasker, and PILER, to perform these tasks. The output of REPET includes a consensus library of repeat sequences, as well as annotations of putative TE locations in the genome, and a soft-masked genome assembly.

EDTA is a tool that is specifically designed for de novo annotation of transposable elements (TEs) in genome assemblies. The output of EDTA includes a consensus library of repeat sequences, as well as annotations of putative TE locations in the genome.

Both DNApipeTE and REPET provide similar functionalities to EDTA and RepeatMasker, and the output of these pipelines can be used for downstream analyses. However, the specific algorithms and parameters used by these pipelines may differ, resulting in different outputs and soft masking results. The choice of which tool to use will depend on the specific needs of the analysis and the characteristics of the genome assembly being analyzed.

Thank you for considering.

Best wishes,

Michal