Open janet-higgins13 opened 2 years ago
Hi Janet.
Interesting problem. As XPCLR only requires population allele counts, the triploid should work fine. VCFs are read in using scikit- allel which supports multiple ploidy.
I'm also not sure what populations exist in a triploid state, but I guess that shouldn't matter too much.
However, it seems that this isn't working properly, as so many of the sites in population 2 are fixed. Would you mind sharing an excerpt of the VCFs?
If not I would recommend using the scikit alele read_vcf function and seeing if it is able to interpret the ploidy in your VCF.
Hi Nick
I have had a look at my vcf using scikit-allel 1.2.1 and it only shows diploid calls
callset = allel.read_vcf('example.vcf') gt = allel.GenotypeArray(callset['calldata/GT'])
0 | 1 | 2 | 3 | 4 | ... | 185 | 186 | 187 | 188 | 189 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0/0 | 0/0 | 0/0 | 0/0 | 0/0 | ... | 0/0 | 0/0 | 0/0 | ./. | ./. | |
0/0 | 0/0 | 0/0 | 0/0 | 0/0 | ... | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
0/0 | 0/0 | 0/0 | 0/0 | 0/0 | ... | 0/0 | 0/0 | 0/0 | ./. |
I have attached a sample of my vcf
Thanks Janet
Thanks Janet- that's interesting I would have assumed it would be read at ploidy=3. As a hack, as it only counts the alternative aleles, as long as the ratios are correct in both populations it should work fine as diploid. So you could create "fake" individuals
ie 0/0/1 and 0/1/1 would be the same as 3 diploid individuals of 0/0, 0/1 and 1/1.
I am running XP-CLR using a vcf output for ploidy3 so the genotype is shown as follows. 0/0/1:16,4:20:36:110,0,36,673
How does XP-CLR interpret triploid calls? I am just checking whether it takes account of the third call or just ignores it and assumes the vcf is diploid?
Obviously, I would like to run it using the triploid vcf if this is valid
I have transformed my vcf to diploid by changing 0/1/1 and 0/0/1 to 0/1
Comparing the output from both runs, I am loosing a lot of SNPs from the triploid run
diploid 2021-09-14 09:33:09 : INFO : running xpclr v1.0.0 2021-09-14 09:33:09 : INFO : Loading VCF 2021-09-14 09:33:23 : INFO : VCF loading complete 2021-09-14 09:33:23 : INFO : 15,265 SNPs in total are in the provided input files 2021-09-14 09:33:23 : INFO : 0 SNPs excluded as multiallelic 2021-09-14 09:33:23 : INFO : 10 SNPs excluded as missing in all samples in a population 2021-09-14 09:33:23 : INFO : 8,243 SNPs excluded as invariant or singleton in population 2 2021-09-14 09:33:23 : INFO : 7,012/15,265 SNPs included in the analysis (45.94%) 2021-09-14 09:33:23 : INFO : Done dropping above SNPs from analysis. XP-CLR algorithm starting. 2021-09-14 09:33:23 : INFO : No genetic distance provided; using rrate of 1e-08/bp 2021-09-14 09:33:23 : INFO : Omega estimated as : 0.362105
triploid 2021-09-14 09:33:08 : INFO : running xpclr v1.0.0 2021-09-14 09:33:08 : INFO : Loading VCF 2021-09-14 09:33:26 : INFO : VCF loading complete 2021-09-14 09:33:26 : INFO : 24,606 SNPs in total are in the provided input files 2021-09-14 09:33:26 : INFO : 0 SNPs excluded as multiallelic 2021-09-14 09:33:26 : INFO : 0 SNPs excluded as missing in all samples in a population 2021-09-14 09:33:26 : INFO : 22,670 SNPs excluded as invariant or singleton in population 2 2021-09-14 09:33:26 : INFO : 1,936/24,606 SNPs included in the analysis (7.87%) 2021-09-14 09:33:26 : INFO : Done dropping above SNPs from analysis. XP-CLR algorithm starting. 2021-09-14 09:33:26 : INFO : No genetic distance provided; using rrate of 1e-08/bp 2021-09-14 09:33:26 : INFO : Omega estimated as : 0.609867
Would you recommend using the diploid or triploid vcf?