oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

Rerun final annotation step on modified genome using previously generated TE family file #341

Closed ZexuanZhao closed 1 year ago

ZexuanZhao commented 1 year ago

Hi oushujun!

I currently have finished running EDTA on the first version of my genome using the following command. The genome is scaffolded by aligning to other species. Contigs are joined into scaffolds and 100 Ns are put between scaffolded contigs.

EDTA.pl \
  --genome genome_to_annotate_TE.fasta \
  --species others \
  --step all \
  --overwrite 1 \
  --force 1 \
  --sensitive 1 \
  --anno 1 \
  --evaluate 1 \
  --threads $threads

However, I modified my genome that 1 scaffolded contig is moved to unscaffolded contigs and 1 contig is re-oriented. This modification does not change the contig composition and contig sequences for the genome.

I do not want to rerun the whole EDTA pipeline again as the RepeatModeler takes 2 weeks. I wonder if I can use the final TE family file generated from the first version of the genome and use that to annotate my new version. If that's possible, could you provide some example codes for me to do it?

Best, Zexuan

oushujun commented 1 year ago

Hello Zexuan,

Yes you can use the TE library from the original genome to RepeatMask the new version, but you will not have structural TE annotation by doing so.

Alternatively, you can lift over coordinate tracks from v1 to v2 since the contig sequences of your genome is unchanged. I have not tried this before, but you have the entire online research community to search from.

Best, Shujun