Closed ValentinaBoP closed 4 years ago
Hi Valentina,
Thank you for using LTR_retriever. It looks like you are using the <2.8.7 version, so the gff3 header is:
Chromosome Annotator Repeat_class/superfamily Start End Diversity(%) Strand SW_score Repeat_famliy
So column 9, Repeat_famliy
, is the repeat family name. Here I used where the repeat was originally found as the family name - sorry for the confusion.
Column 8, SW_score
, is the Repeatmasker Smith–Waterman score. The higher the more confident the alignment between the library sequence and the annotated region.
Column 6, Diversity(%)
, is the divergence between the library sequence and the annotated region.
For the v2.8.7+ region, I reorder the column information:
seqid source repeat_class/superfamily start end sw_score strand phase attributes
The 6th column changed to sw_score
, and the divergence info is moved the 9th column, leaving the 8th column blank to cope with the standard GFF3 format.
Best, Shujun
Thanks for the clarification!!
Dear Shujun,
first, thanks for developing this powerful tool :)
I am a little confused about the interpretation of the last column of the .mod.out.gff file that should contain the whole-genome LTR-RT annotation by the non-redundant library. For example I was curious to specifically look at the LTRs present on the sequence CM000121.5.
Column 4 and 5 contain the genomic coordinates for the LTR elements but what does the last column mean? Does it mean that the same sequence annotated onto CM000121.5 is also found on (first row) CM000093.5:20437004..20442923_INT?
Can you please briefly explain this output? Also, what are column 6 and 8?
Thank you for your help and time!
Valentina