gymreklab / GangSTR

A tool for profiling long STRs from short reads
GNU General Public License v2.0
85 stars 16 forks source link

HELP #112

Open zhangjinpengGithub opened 3 years ago

zhangjinpengGithub commented 3 years ago

My experimental data is not whole genome sequencing but restriction site-associated DNA sequencing (RAD-seq), and I would like to know if I can use your software under such conditions. Thanks again!

nmmsv commented 3 years ago

GangSTR uses coverage in order to estimate the length of large repeat expansions. If the coverage is relatively uniform in a region around the locus of interest, GangSTR should be able to perform genotyping. Otherwise, the results are probably not accurate.

zhangjinpengGithub commented 3 years ago

I have discovered through samtools tview that RAD-seq does have not uniform distribution in a SSR region. In other words, some reads do not cover the whole SSR region, I would like to ask if you know how to filter these reads in the BAM file? Or how to solve this kind of problem?

nmmsv commented 3 years ago

It is ok if the reads not all reads cover the entire repeat region, what matters is that if there if the coverage is relatively constant in a larger region (5-10KB) or if it fluctuates. If the first case is correct, you can supply the constant coverage to GangSTR with --coverage (no need to filter the Bam file). Otherwise, GangSTR won't be able to report an accurate genotype.