ksiewert / BetaScan

Genome-wide scan for balancing selection using beta statistic
27 stars 5 forks source link

Unexpected results #9

Closed benoitnabholz closed 3 years ago

benoitnabholz commented 3 years ago

Dear Katherine,

We recently used Betascan2 on the datasets generated by Singhal et al. 2015 (https://science.sciencemag.org/content/350/6263/928.abstract). This dataset is composed of 19 Zebra finch individuals sequenced at a coverage >10X.

When we applied Betascan2 (both the folded and unfolded version), I was surprised to see that the SNP with the highest Beta* statistics had, on average, a frequency that were lower than the rest of the genome (typically a frequence = 3% vs 5%).

Do you have any idea of what is going one? Do you have any advice on what should we check to see if the program is running correctly?

Thank-you for your help, Benoit Nabholz

ksiewert commented 3 years ago

Hi Benoit,

Great question (and smart to check this)! At low allele frequencies some scans for balancing selection can be prone to false positives. I think you'll find the discussion of the -m parameter in this section of the BetaScan wikihttps://github.com/ksiewert/BetaScan/wiki/Basic-Usage#explanation-of-parameters relevant. In humans we find setting the -m parameter to .15 to be sufficient to remove false positives. If zebrafish have a higher effective population size than humans, you may be able to go a bit lower than this. Hope this helps and let me know if you have any more questions.

Best, Katie


From: benoitnabholz @.> Sent: Friday, April 2, 2021 10:09 AM To: ksiewert/BetaScan @.> Cc: Subscribed @.***> Subject: [ksiewert/BetaScan] Unexpected results (#9)

Dear Katherine,

We recently used Betascan2 on the datasets generated by Singhal et al. 2015 (https://science.sciencemag.org/content/350/6263/928.abstracthttps://urldefense.proofpoint.com/v2/url?u=https-3A__science.sciencemag.org_content_350_6263_928.abstract&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=RCZkW2_0szgFcbgrJQDVb0JJbsAhoWepiDItwyb9mwg&m=WdutRlHsurzebXIFWhaQX3_2wCr5F-9yFTMZrQcpG8k&s=X5kmGGRUG82dFtOUT4Qgi0j7CI-ZV-c8S84oRnDQFFI&e=). This dataset is composed of 19 Zebra finch individuals sequenced at a coverage >10X.

When we applied Betascan2 (both the folded and unfolded version), I was surprised to see that the SNP with the highest Beta* statistics had, on average, a frequency that were lower than the rest of the genome (typically a frequence = 3% vs 5%).

Do you have any idea of what is going one? Do you have any advice on what should we check to see if the program is running correctly?

Thank-you for your help, Benoit Nabholz

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ksiewert_BetaScan_issues_9&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=RCZkW2_0szgFcbgrJQDVb0JJbsAhoWepiDItwyb9mwg&m=WdutRlHsurzebXIFWhaQX3_2wCr5F-9yFTMZrQcpG8k&s=Q6CP7TEvcTJNJlNCih3RZa3EVFzKUDwCr5gHZvP49TM&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACCFGRZKANGIQTZAZJPE2QLTGXFYHANCNFSM42I5QFGA&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=RCZkW2_0szgFcbgrJQDVb0JJbsAhoWepiDItwyb9mwg&m=WdutRlHsurzebXIFWhaQX3_2wCr5F-9yFTMZrQcpG8k&s=61YyDafom1q8T-qRDPLpXKBttQagry8MqXETTHK9buY&e=.

benoitnabholz commented 3 years ago

Hi Katie,

Sorry for the late feedback. We applied the "-m" parameter and the results make much more sense now. The SNP with highest Beta* score have an average frequency close to 0.5 (unfolded version) which is significantly higher than the rest (~0.3).

Thank-you! Benoit

ksiewert commented 3 years ago

Great to hear!