gymreklab / GangSTR

A tool for profiling long STRs from short reads
GNU General Public License v2.0
85 stars 16 forks source link

Reference .bed file difference #96

Closed apredeus closed 3 years ago

apredeus commented 4 years ago

Hello,

I'm trying several tools mentioned in trtools, and am curious about the following observation:

I think they are supposed to be generated using the same algorithm (trf). Do you know why is there such difference?

Thank you in advance!

nmmsv commented 3 years ago

Hello, The HipSTR reference contains more imperfect repeats, as HipSTR is capable of genotyping those as well. GangSTR reference was further refined to only include perfect repeats (no interruption between copies of the motif, and no mutation inside the repeat). Both reference sets also have other filters to weed out complex regions that are prone to error, and those filters are not necessarily the same. Another point is that GangSTR reference includes longer motifs (up to 20bp), but HipSTR only includes up to 6bp.

They both originate from trf outputs, but the filtering steps are very different. Please let me know if you have any other questions. Best, Nima

apredeus commented 3 years ago

Thank you very much - very clear and informative!