hardingnj / xpclr

Code to compute the XP-CLR statistic to infer natural selection
MIT License
90 stars 27 forks source link

Specify value used to define missing genotypes. #38

Open JunpengShi opened 5 years ago

JunpengShi commented 5 years ago

Dear Nick,

I'm wondering that if your XPCLR could deal with missing genotypes?

Previous XPCLR (Chen, et al., 2010.) format the missing genotypes with 9.

I used the same file format to run your XPCLR, and I found that overwhelming majority of SNPs were reported to be multiallelic,possibily due to the missing genotypes?

2019-06-03 09:38:15 : INFO : running xpclr v1.1.1 2019-06-03 09:38:15 : INFO : Loading TXT 2019-06-03 09:39:38 : INFO : TXT loading complete 2019-06-03 09:39:38 : INFO : 1,214,768 SNPs in total are in the provided input files 2019-06-03 09:39:39 : INFO : 1,212,857 SNPs excluded as multiallelic 2019-06-03 09:39:39 : INFO : 0 SNPs excluded as missing in all samples in a population 2019-06-03 09:39:39 : INFO : 605 SNPs excluded as invariant or singleton in population 2 2019-06-03 09:39:39 : INFO : 1,306/1,214,768 SNPs included in the analysis (0.11%) 2019-06-03 09:39:40 : INFO : Done dropping above SNPs from analysis. XP-CLR algorithm starting. 2019-06-03 09:39:40 : INFO : Omega estimated as : 0.236594 2019-06-03 09:40:12 : INFO : Analysis complete. Output file ./chr7.17parviglumis_23landraces

Sincerely, Junpeng

JunpengShi commented 5 years ago

My genotype file looks like this: 0 0 9 9 1 1 9 9 0 0 0 0 9 9 9 9 9 9 0 0 0 0 9 9 0 0 9 9 0 0 1 1 0 0 0 0 9 9 0 0 9 9 0 0 0 0 0 0 9 9 9 9 0 0 0 0 0 0 0 0 9 9 0 0 0 0 0 0 0 0 9 9 0 0 9 9 0 0 9 9 0 0 0 0 0 0 0 0 0 0 9 9 0 0 9 9 0 0 0 0 0 0 0 0 0 0 9 9 9 9 0 0 9 9 0 0 9 9 0 0 0 0 0 0 0 0 9 9 9 9 0 0 9 9 0 0 0 0 9 9 0 0 9 9 0 0 9 9 0 0 9 9 0 0 0 0 0 0 0 0 9 9 0 0 0 0 9 9 0 0 0 0 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 9 9 0 0 0 0 9 9 0 0 0 0 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 9 0 0 0 0 0 0 0 0 1 1 9 9 0 0 9 9 0 0 9 9 1 1 1 1 0 0 0 0 1 1 1 1 9 9 1 1 1 1 0 0 1 1 1 1 9 9 9 9 0 0 0 0 1 1 1 1 1 1 9 9 9 9 1 1 1 1 9 9 1 1 1 1 1 1 1 1 1 1 9 9 9 9 0 0 9 9 0 0 1 1 1 1 0 0 0 0 1 1 1 1 9 9 1 1 1 1 9 9 1 1

hardingnj commented 5 years ago

Yes. Missing genotypes should be encoded as -1, following the convention of VCF.

JunpengShi commented 5 years ago

Thanks Nick. It works now!

hardingnj commented 5 years ago

Leaving this open as a note to document this more fully. Generally I want to encourage users to give VCF or zarr as inputs though.

kizzhengwangshan commented 4 years ago

dear hardingnj,when i runing the xpclr ,the wrong information to me :ValueError: arange: cannot compute length,how can i deal with this wrong.