Closed MatteoSebastianelli closed 1 year ago
Hi Matteo, Are your values higher or lower than you expected (I think you left out a word)? People often see very high values using my script when invariant sites are excluded, but if yours includes invariants, there is no reason to expect a difference. We have found very similar numbers to those from pixy for example. Simon
Hi Simon,
yes, I did leave out a word! I meant higher. The problem is that I am getting similar estimates when using both vcfs with and without invariant sites. I wonder whether filtering (e.g. max-missingness, maf...) might have something to do with this?
Matteo
Hi Matteo, Sorry for the delay. This is very strange. If you send me a small subset of your input files (with and without invariants) I can try to diagnose what is happening. Simon
Hi Martin,
Sure, I have uploaded the subsets here: https://drive.google.com/drive/folders/1S62j7N1gy7I8j211u49AKx0WXKXSjVzT?usp=sharing
Best, Matteo
Hi Matteo, Your "all sites" file does not contain invariant sites.
#CHROM POS ID REF ALT
SUPER_25 24381 . T C
SUPER_25 104391 . T G
SUPER_25 104404 . T G
SUPER_25 107757 . T G
Note that each site has both a reference and alternate allele, and all the thousands of positions between these variants are not present.
Hi Simon,
Sorry for the delay in the response, but I wanted to make sure I ran everything correctly. I ran the pipeline again and made sure the file actually contains invariant sites as well, and after running your script I now get the expected values of Dxy. Thank you for your help and sorry for bothering you with this!!
Matteo
Hi Matteo, Great - thanks for confirming. All the best!
Hi Simon,
Thank you for providing he community with this important tool! I just have a question, using your popgenWindows.py to calculate genome-wide Dxy (my vcf includes invariant sites), I noticed that the estimates I was getting are two orders of magnitude than they tipically are for birds. Is this python script getting per-SNP estimates rather than per-nucleotide?
Best, Matteo