oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

Can I concatenate EDTA library and Repbase library manually? #472

Open Marh32 opened 1 week ago

Marh32 commented 1 week ago

Hi Professor Ou:

I would like to ask whether I can concatenate EDTA library and Repbase library, then run RepeatMasker manually? I was glad to find EDTA can significantly reduce the percentage of unclassified in results. However, compare to the results from repeatmodeler customed library + Repbase library, the percentage of repeat mask in EDTA's results is lower. Here are my results: When I run EDTA with following commandperl ../EDTA.pl --genome genome.fa --overwrite 1 --sensitive 1 --anno 1 --threads 30 , I got the result(~31%):

Screenshot 2024-06-19 at 22 33 23

But concatenate the Repbase and RepeatModeler customed library then run Repeatmasker cat my_genome-families.fa RepeatMaskerLib.fasta > combine.fasta, RepeatMasker -pa 28 -s -lib combine.fasta -dir RMasker -e rmblast my_genomic.fna I got the result(~40%):

Screenshot 2024-06-19 at 22 58 10

And I run Repeatmasker based on RepeatModeler customed library( Repbase library is not included) RepeatMasker -pa 28 -s -lib my_genome-families.fa -dir RMasker -e rmblast my_genomic.fna, I got the result(~35%):

Screenshot 2024-06-20 at 12 46 34

Is it possible that the lack of repbase library is causing the output to be low? Can I concatenate EDTA library and Repbase library to improve it? Or are there other reasons for the difference? Thanks for your help in advance.

Best regards, Hao