Closed pbasting closed 3 years ago
Hi @pbasting,
Sounds good. I haven't spent much time with this code in recent years, but I just took another look through parts of it and I am nearly certain that I am giving you the correct responses.
Are the coordinates 0-based or 1-based?
0-based
Do the breakpoint positions indicate the final position in the reference genome before you see evidence of an insertion, or the position where you see the transition from reference to insertion?
It should be the former, as the last mapped position adjacent to start of soft-clipping.
Am I correct in interpreting a prediction with a 5' breakpoint larger than the 3' breakpoint as a non-reference insertion with a TSD?
Yes, it is consistent with the presence of a TSD (of length 4 in your example). The usual caveat applies that it could also be caused by mapping error.
I hope this helps. Feel free to ask any other questions and I'll do my best to answer.
Jeff
Thanks Jeff, that's very helpful! I'll let you know if anything else comes up.
Hi @jradrion
I'm currently working on revamping and adding new TE detection methods to the McClintock pipeline and am interested in integrating TEFLoN, I just have a few questions about interpreting the output. Specifically, I want to make sure I am interpreting the breakpoint coordinates correctly.
10
mean you see split-read mapping like:5 6 7 8 9 10 TE TE TE TE
or5 6 7 8 9 TE TE TE TE
. My assumption is the former but I want to make sure.92
and a 3' breakpoint at89
indicate there is a non-reference insertion with a TSD of length 4?my interpretation
5' Breakpoint Evidence: 84 85 86 87 88 89 90 91 92 -- -- -- -- -- -- 3' Breakpoint Evidence: -- -- -- -- -- 89 90 91 92 93 94 95 96 97 98