oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

right and left LTR length query #402

Closed katieemelianova closed 6 months ago

katieemelianova commented 7 months ago

Hi there,

Thanks a lot for the software, it's been really useful :)

I'm annotating plant genome repeats using EDTA, and I found that the resulting annotations for the right and left LTRs in a lot of cases were quite a bit outside the length range I might expect them to be. It may be I am mistaken somewhere along the way so apologies if so, here is what I did:

# grep out lines from GFF containing right LTR, subtract the end from the start position to get length
grep "rLTR" impolita.fasta.mod.EDTA.anno/impolita.fasta.mod.EDTA.intact.gff3 | awk '{print $5 - $4}' > rLTR_lengths

I then made a quick histogram of these lengths and got the following histogram at the bottom of the page:

I have had a play around with LTR families and in general I see most LTRs to be within the range of 200-400bp each. Do you know if this is normal for annotated LTRs to have lengths of above 2000bp?

I will carry on digging around but if you have any inputs or thoughts I'd be really grateful!

Best,

Katie

Screenshot 2023-11-14 at 18 48 30
oushujun commented 6 months ago

Hi Katie,

Sorry, I thought I replied to this thread, but apparently I didn't! The length distribution looks good to me. The average length of intact LTR elements in maize is 9782bp, and the average length of LTR region in maize is 1557bp. If you check out the LTR_retriever paper, in Supp Fig 1, the distribution of LTR region length from 50 plant genomes is also quite wide:

image

Usually, the Ty3/Gypsy superfamily is longer than the Ty1/Copia superfamily.

Let me know if you have further questions.

Best, Shujun

katieemelianova commented 6 months ago

Hi Shujun,

Many thanks for the reply, after a bit more digging I did realise that there is quite a bit more variation in the LTR length, but the links you provided are super useful for confirming this. Thanks so much again for the excellent tool, and for your help with this query :)

Best,

Katie