oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
336 stars 73 forks source link

Still confused about the use and results of EDTA #362

Closed zhangwenda0518 closed 9 months ago

zhangwenda0518 commented 1 year ago

Dear teacher, Good evening. I'm still confused about the use and results of EDTA. I have the following questions. Can you help me?

  1. Whether to change lowercase base to uppercase in genome file, I got different results before and after the base case change by using EDTA.(helitron 8.28%), uppercase base(helitron 13.50%))
  2. I used EDTA to predict the genome of rice twice(lowercase base , and I got a different prediction result from the paper(helitron 3.57%),I used the same command. nohup EDTA.pl --genome ../genome.fasta -u 1.3e-8 --sensitive 1 --anno 1 --evaluate 1 --threads 80 --force 0 --overwrite 0 &
  3. I see that you have the calculation of LTR insertion time. Can Helitrons use the same method to calculate Helitrons insertion time?

Thanks for your reply. Thank you very much.

oushujun commented 1 year ago

Hello,

  1. I haven't noticed the effect of letter case in helitron annotation. Can you compare HelitronScanner results (in raw/Helitron/) on your two experiments and see if they are identical?

  2. The current version of EDTA has been updated a lot since the publication (there are 361 issues before you and you open #362), so the results maybe different from the publication.

  3. No. Please read the LTR_retriever paper for details of insertion time calculation.

Best, Shujun

oushujun commented 1 year ago

@zhangwenda0518 anymore questions?

zhangwenda0518 commented 1 year ago

Thank you for still caring about my problem. I am observing the changes of genome size, so I downloaded a lot of public data sets. Occasionally, I noticed that some data sets are lowercase bases.may be uploaded genomes with masked repeats,。 In short, it is good to check in advance, and it is also very simple to change the base: suchas:seq kit seq-u

According to your suggestion, I compared the two documents and the results are as follows. Not only Helitron changed,but also TIR

uppercase the fasta

image

the result summary image

the HelitronScanner image

lowercase

the fasta image

the result summary image

the HelitronScanner image

oushujun commented 1 year ago

This is wierd, and it seems to be linked to HelitronScanner as the tail and head files generated are not the same size for lower and upper case genome input. Can you do a further test on HelitronScanner by converting the whole genome into lower case and run EDTA.raw.pl --step helitron?

oushujun commented 1 year ago

You may also compare raw TIR-Learner results by checking file sizes in this folder $genome.fa.mod.EDTA.raw/TIR/TIR-Learner-Result/.