xunchen85 / ERVcaller

ERVcaller is a tool designed to accurately detect and genotype non-reference unfixed endogenous retroviruses (ERVs) and other transposable elements (TEs) in the human genome using next-generation sequencing (NGS) data. We evaluated the tools using both simulated and real benchmark whole-genome sequencing (WGS) datasets. ERVcaller is capable to accurately detect various TE insertions of any lengths, particularly ERVs. It allows for the use of a TE reference library regardless of sequence complexity, such as the entire RepBase database. It is easy to install and use with command lines.
http://www.uvm.edu/genomics/software/ERVcaller.html
14 stars 4 forks source link

Length of TSD #29

Closed 1577377232 closed 7 months ago

1577377232 commented 8 months ago

Hi xun,

Why is the TSD so long? Are these sites false positives, or what can I do about them?

Best, Dan

xunchen85 commented 7 months ago

Hi Dan,

It depends on the situation. First of all the TSD and length is estimated based on the two up-/down-stream breakpoints detected. But the TSD for some LINE elements could be very long too.

I would suggest you check some loci first.

Best, Xun

1577377232 commented 7 months ago

I used IGV to check the breakpoints, and extracted the reads mapped within 150bp from TSD, assembled and compared them. Is there a way to assemble a more complete ERV, via discordant reads or split reads? Because I saw in the VCF file,INFO column,INFOR information is exciting! very complete. For example, INFOR=ERV3, 1,7525,7525.

Best, Dan