Closed a7420174 closed 2 years ago
Hi JaeHyun, I haven't tested STRling on PCR+ WGS data. Based on my experience with exomes, it will likely work, but may be less accurate, especially with high GC loci. I would suggest excluding homopolymers, LCRs, seg dups, telomeres and centromeres, as described in the paper. Make sure all your controls are sequenced in the same way as your cases. Warm regards, Harriet
Thanks for reply, Harriet.
Then what do you think about using STRling output columns like depth as quality metrics? Can they improve the quality of STR data (e.g. depth ≥ 5)?
Yes, applying a depth filter would be a reasonable approach. Just so you know for typical PCR-free WGS, after applying the suggested filters you would expect to see on the order of 20-100 significant outliers per individual. If you were to then look at those near genes, that should get that number down even further. I'd suggest looking at the list if variants and see how much further filtering is needed to achieve your goals.
Oh, thank you. Your suggestions would be helpful!
Hi, Harriet. Could I ask you more questions?
I am comparing STRling and another tool, ExpansionHunter Denovo. EHdn detects lots of tandem repeats near centromeric and telomeric regions, but STRling detects a few repeats. Is there any consideration for that? And I called repeats using T2T-CHM13 reference genome, but I'm not certain that my results are reliable. Have you ever called repeats using T2T-CHM13. If so, I'd appreciate it if you share your experience.
Are the STRs in centromeres/telomeres or just near them? I'd be very cautious about any variant calling in centromeres/telomeres, segmental duplications or low complexity regions.
I have not tried the T2T genome yet. I can't imagine it would cause a problem. If anything, it should improve things. But I haven't assessed that specific question.
Umm yes they are mainly subtelomeric or peri/centromeric satellite repeats, and I also doubt the estimated size of the STRs detected by EHdn. I'm just curious about the difference in STR detection between two tools. I thought you maybe applied blacklisted region to STRling..
STRling reports all regions by default. Filtering for specific regions would be up to the user.
Aha Okay then it's likely that it results from a difference in algorithms. Thanks so much!
Hi, I'm using STRling for PCR-based WGS data and worried that it's the right way. When I read your doc and paper, I can't find any contents about this so I want to ask you.
Also, if ok, can you recommend some filters useful for short tandem repeat QC when considering PCR-based WGS data.
Thanks, JaeHyun