z0on / 2bRAD_denovo

Genome-wide de novo genotyping with 2bRAD
21 stars 20 forks source link

Differences between ANGSD versions' vcf/bcf conversion to input file for Bayescan #3

Open lexiebsturm opened 3 years ago

lexiebsturm commented 3 years ago

Hi Misha, I hope all is well! I have a question about a recent ANGSD update (v0.933) which no longer supports the -doVcf flag and instead requires a -doBcf flag. This version now creates a bcf file instead of a vcf. The file format looks similar to the vcf file created by earlier ANGSD versions and both say they are format vcf v4.2, however, I think they may be coding missing data differently?

When I use PGDspider to convert the old vcf file to bayescan input the second column (twice the number of individuals in that pop) is the same across all loci for the pop. When I convert the bcf file to bayescan input the second column is slightly different for different loci within a pop. The bayescan manual says this can happen for different loci because it is accounting for missing data. See examples below:

``

Converting vcf output from ANGSD v0.921 to Bayescan input following your code using PGDspider

less vcf.bayescan ``

[loci]=10120

[populations]=8

[pop]=1 1 30 2 2 28 2 30 2 28 2 3 30 2 28 2 4 30 2 4 26 5 30 2 2 28 6 30 2 25 5 7 30 2 27 3 8 30 2 28 2 9 30 2 5 25 10 30 2 27 3

``

Converting bcf output from ANGSD v0.933 to Bayescan input following your code using PGDspider

less bcf.bayescan `` [loci]=10120

[populations]=8

[pop]=1 1 28 2 2 26 2 28 2 26 2 3 28 2 27 1 4 30 2 4 26 5 30 2 2 28 6 26 2 21 5 7 28 2 25 3 8 28 2 26 2 9 28 2 5 23 10 28 2 25 3

I am currently running both to see if there are major differences between the two in number of outliers but I imagine there will be issues because the way it calculates the allele frequencies will be different. Which would be the better way to go? Thank you!

z0on commented 3 years ago

Hi Lexie - sorry about delay - I really don’t have an opinion on this, please explore! I’d appreciate if you let me know what you find

Btw for outlier scan, check out pcangsd (it is quite magical)

On Tue, Nov 10, 2020 at 8:31 AM Alexis Sturm notifications@github.com wrote:

Hi Misha, I hope all is well! I have a question about a recent ANGSD update (v0.933) which no longer supports the -doVcf flag and instead requires a -doBcf flag. This version now creates a bcf file instead of a vcf. The file format looks similar to the vcf file created by earlier ANGSD versions and both say they are format vcf v4.2, however, I think they may be coding missing data differently?

When I use PGDspider to convert the old vcf file to bayescan input the second column (twice the number of individuals in that pop) is the same across all loci for the pop. When I convert the bcf file to bayescan input the second column is slightly different for different loci within a pop. The bayescan manual says this can happen for different loci because it is accounting for missing data. See examples below:

Converting vcf output from ANGSD v0.921 to Bayescan input following your

code using PGDspider less vcf.bayescan

[loci]=10120

[populations]=8

[pop]=1 1 30 2 2 28 2 30 2 28 2 3 30 2 28 2 4 30 2 4 26 5 30 2 2 28 6 30 2 25 5 7 30 2 27 3 8 30 2 28 2 9 30 2 5 25 10 30 2 27 3

Converting bcf output from ANGSD v0.933 to Bayescan input following your

code using PGDspider less bcf.bayescan [loci]=10120

[populations]=8

[pop]=1 1 28 2 2 26 2 28 2 26 2 3 28 2 27 1 4 30 2 4 26 5 30 2 2 28 6 26 2 21 5 7 28 2 25 3 8 28 2 26 2 9 28 2 5 23 10 28 2 25 3

I am currently running both to see if there are major differences between the two in number of outliers but I imagine there will be issues because the way it calculates the allele frequencies will be different. Which would be the better way to go? Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/z0on/2bRAD_denovo/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGCK3DJYVW43J4TSJGDSPFFDDANCNFSM4TQXMJOA .

-- cheers Misha matzlab.weebly.com

lexiebsturm commented 3 years ago

Hi Misha, Thank you for the recommendation about pcangsd. I will try it out! Just as an update to the Bayescan issue, I ran Bayescan on both versions (with the old vcf and the new bcf converted to bayescan input) and at the same q-value significance level get very different numbers and lists of putative outlier SNPs. I'll try to follow up with the ANGSD people and see what they recommend but will also try PCangsd.

Thank you! Lexie

z0on commented 3 years ago

please keep me posted. I expect angsd people might recommend something that will get around the need to "hard-call" genotypes (vcf/bcf files). PCAngsd does "selection scan" but it is not a model-based comparison like Bayescan, it is simpler but also more straightforward - tells you which SNPs are really strongly differentiated among genetic groups (that it automatically identifies). Misha

On Thu, Nov 12, 2020 at 12:40 PM Alexis Sturm notifications@github.com wrote:

Hi Misha, Thank you for the recommendation about pcangsd. I will try it out! Just as an update to the Bayescan issue, I ran Bayescan on both versions (with the old vcf and the new bcf converted to bayescan input) and at the same q-value significance level get very different numbers and lists of putative outlier SNPs. I'll try to follow up with the ANGSD people and see what they recommend but will also try PCangsd.

Thank you! Lexie

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/z0on/2bRAD_denovo/issues/3#issuecomment-726264321, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGECWJIKH4V2SNYUV2TSPQT3RANCNFSM4TQXMJOA .