ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

File corrupted #108

Open WeixuanPlant opened 3 months ago

WeixuanPlant commented 3 months ago

I used pixy to estimate chromosome by chromsome of my master vcf, and everything was ran in a batch, therefore, it says if one failed all should be failed. I followed the instruction and merged variation and invariation sets for each chromosome. However, I got the results for all of my other chromosomes, except for Chr A02, A04. I cannot figure out what's wrong with my VCF, which can be successfully tabix indexed and all seems to be right. Espiecially all other chromsomes were successfully finished. Then I added the flag --bypass_invariant_check yes and its working now. Then I am wondering if its the inital quality check has some problems.

(base) [weixuan@speedy3 Pixy]$ pixy --stats pi fst  dxy --vcf GD_n90_interval.A02.combined.rehead.vcf.gz --populations pixy_populationlist.txt --window_size 10000 --n_cores 10 --output_prefix GD_n90_interval.A02
[pixy] pixy 1.2.7.beta1
[pixy] See documentation at https://pixy.readthedocs.io/en/latest/

[pixy] Validating VCF and input parameters...
[pixy] Checking write access...OK
[pixy] Checking CPU configuration...OK
[pixy] Checking for invariant sites...Exception: [pixy] ERROR: the provided VCF appears to contain no variable sites. It may have been filtered incorrectly, or otherwise corrupted.

Here an example of my VCF file first two lines,

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  AD1_Cultivar_P115HAU041 AD1_Cultivar_P115HAU059 AD1_Cultivar_P115HAU066 AD1_Cultivar_P115HAU094   AD1_Cultivar_P115HAU101 AD1_Cultivar_P115HAU146 AD1_Cultivar_P115HAU242 AD1_Cultivar_B11SA1465  AD1_Cultivar_B11SA1427  AD1_Cultivar_B11SA0825    AD1_GD_G10B     AD1_GD_G11A     AD1_GD_G12C     AD1_GD_G13C     AD1_GD_G14B     AD1_GD_G15C     AD1_GD_G16C     AD1_GD_G17A     AD1_GD_G18B     AD1_GD_G19B       AD1_GD_G1A      AD1_GD_G20C     AD1_GD_G21C     AD1_GD_G22C     AD1_GD_G23A     AD1_GD_G24A     AD1_GD_G25B     AD1_GD_G2C      AD1_GD_G3A        AD1_GD_G4B      AD1_GD_G5A      AD1_GD_G6C      AD1_GD_G7C      AD1_GD_G8C      AD1_GD_G9A      AD1_LR1_B12TX2465       AD1_LR1_B12TX2489       AD1_LR1_B12TX2487 AD1_LR1_B12TX1592       AD1_LR1_B12TX1689       AD1_LR1_B12TX1935       AD1_LR1_B12TX1635       AD1_LR1_B12TX2216       AD1_LR1_B12TX2226         AD1_LR1_B12TX1578       AD1_LR2_B12TX0109       AD1_LR2_B12TX0487       AD1_LR2_B12TX0240       AD1_LR2_B12TX1661       AD1_LR2_B12TX1721       AD1_LR2_B12TX1111 AD1_LR2_B12TX1459       AD1_LR2_B12TX1102       AD1_LR2_B12TX1981       AD1_LR2_B12TX0390       AD1_MK_Pop1_10  AD1_MK_Pop1_11  AD1_MK_Pop1_12    AD1_MK_Pop1_13  AD1_MK_Pop1_14  AD1_MK_Pop1_15  AD1_MK_Pop1_16  AD1_MK_Pop1_17  AD1_MK_Pop1_1   AD1_MK_Pop1_2   AD1_MK_Pop1_3   AD1_MK_Pop1_4   AD1_MK_Pop1_5     AD1_MK_Pop1_6   AD1_MK_Pop1_7   AD1_MK_Pop1_8   AD1_MK_Pop1_9   AD1_MK_Pop2_1   AD1_MK_Pop2_2   AD1_MK_Pop2_3   AD1_MK_Pop2_4   AD1_MK_Pop2_5     AD1_MK_Pop2_6   AD1_MK_Pop2_7   AD1_MK_Pop3_1   AD1_Wild_B12BPS1151_curaã§ao    AD1_Wild_B12BPS1244_puerto_rico AD1_Wild_B12BPS1247_puerto_rico AD1_Wild_B12TX0967_jamaica        AD1_Wild_B12TX1894_venezuela    AD1_Wild_B12TX1996_mexico       AD1_Wild_B12TX2090_mexico       AD1_Wild_B12TX2211_netherlands    AD1_Wild_B12TX2276_dominican_republic   AD1_Wild_P121TX2094_mexico
A02     811     .       A       .       28.01   LowQual .       GT:AD:DP        0/0:6:6 0/0:2:2 0/0:5:5 0/0:3:3 0/0:2:2 0/0:2:2 0/0:0:0 0/0:6:6 0/0:19:19         0/0:15:15       0/0:6:6 0/0:3:3 0/0:11:11       0/0:6:6 0/0:4:4 0/0:13:13       0/0:5:5 0/0:6:6 0/0:3:3 0/0:4:4 0/0:0:0 0/0:0:0 0/0:3:3 0/0:11:11         0/0:16:16       0/0:5:5 0/0:10:10       0/0:1:1 0/0:0:0 0/0:5:5 0/0:3:3 0/0:10:10       0/0:12:12       0/0:4:4 0/0:7:7 0/0:22:22       0/0:15:15         0/0:22:22       0/0:34:34       0/0:16:16       0/0:17:17       0/0:11:11       0/0:14:14       0/0:17:17       0/0:21:21       0/0:8:8 0/0:25:25         0/0:0:0 0/0:11:11       0/0:9:9 0/0:16:16       0/0:23:23       0/0:32:32       0/0:10:10       0/0:15:15       0/0:11:11       0/0:22:22       0/0:3:3   0/0:0:0 0/0:7:7 0/0:11:11       0/0:12:12       0/0:12:12       0/0:12:12       0/0:8:8 0/0:9:9 0/0:6:6 0/0:8:8 0/0:11:11       0/0:11:11       0/0:2:2   0/0:14:14       0/0:9:9 0/0:22:22       0/0:10:10       0/0:7:7 0/0:4:4 0/0:24:24       0/0:11:11       0/0:13:13       0/0:3:3 0/0:11:11       0/0:9:9   0/0:4:4 0/0:2:2 0/0:13:13       0/0:4:4 0/0:8:8 0/0:6:6 0/0:43:43
A02     812     .       A       .       28.44   LowQual .       GT:AD:DP        0/0:6:6 0/0:2:2 0/0:5:5 0/0:3:3 0/0:2:2 0/0:2:2 0/0:0:0 0/0:6:6 0/0:19:19         0/0:15:15       0/0:6:6 0/0:3:3 0/0:11:11       0/0:6:6 0/0:4:4 0/0:13:13       0/0:5:5 0/0:6:6 0/0:3:3 0/0:4:4 0/0:0:0 0/0:0:0 0/0:3:3 0/0:11:11         0/0:16:16       0/0:6:6 0/0:10:10       0/0:1:1 0/0:0:0 0/0:5:5 0/0:3:3 0/0:10:10       0/0:12:12       0/0:4:4 0/0:7:7 0/0:21:21       0/0:15:15         0/0:22:22       0/0:34:34       0/0:17:17       0/0:19:19       0/0:11:11       0/0:15:15       0/0:17:17       0/0:21:21       0/0:8:8 0/0:25:25         0/0:0:0 0/0:11:11       0/0:9:9 0/0:16:16       0/0:23:23       0/0:32:32       0/0:10:10       0/0:15:15       0/0:11:11       0/0:22:22       0/0:3:3   0/0:0:0 0/0:5:5 0/0:11:11       0/0:12:12       0/0:12:12       0/0:12:12       0/0:8:8 0/0:9:9 0/0:6:6 0/0:8:8 0/0:11:11       0/0:11:11       0/0:2:2   0/0:14:14       0/0:9:9 0/0:22:22       0/0:10:10       0/0:6:6 0/0:4:4 0/0:25:25       0/0:11:11       0/0:13:13       0/0:3:3 0/0:11:11       0/0:9:9   0/0:4:4 0/0:2:2 0/0:13:13       0/0:4:4 0/0:8:8 0/0:6:6 0/0:43:43
WeixuanPlant commented 3 months ago

I checked the first position of variable site in my A02, and it started at 25483 and for A04 54548, my guess is the inital check only reached to certain line length in VCF, which failed for A02 and A04. I guess the limit is 15000.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SRR4006675      SRR4006691      SRR4006698      SRR4006732      SRR4013321      SRR4013897        SRR4018588      SRR6311779      SRR6311783      SRR6311822_B11SA0825    YUC-G10B        YUC-G11A        YUC-G12C        YUC-G13C        YUC-G14B  YUC-G15C        YUC-G16C        YUC-G17A        YUC-G18B        YUC-G19B        YUC-G1A         YUC-G20C        YUC-G21C        YUC-G22C        YUC-G23A  YUC-G24A        YUC-G25B        YUC-G2C         YUC-G3A         YUC-G4B         YUC-G5A         YUC-G6C         YUC-G7C         YUC-G8C         YUC-G9A   SRR6311600_BYU12435     SRR6311749      SRR6311752      SRR6311758      SRR6311762      SRR6311799      SRR6311803      SRR6311806      SRR6311807        SRR6311813      SRR6311544_BYU12239     SRR6311550_BYU12260     SRR6311653_BYU12249     SRR6311747_BYU12359     SRR6311751      SRR6311759      SRR6311768        SRR6311801      SRR6311802      SRR6311805      Pop1_10         Pop1_11         Pop1_12         Pop1_13         Pop1_14         Pop1_15           Pop1_16         Pop1_17         Pop1_1  Pop1_2  Pop1_3  Pop1_4  Pop1_5  Pop1_6  Pop1_7  Pop1_8  Pop1_9  Pop2_1  Pop2_2  Pop2_3  Pop2_4  Pop2_5  Pop2_6    Pop2_7  Pop3_1  SRR6311622_B12BPS1151   SRR6311625_B12BPS1244   SRR6311626_B12BPS1247   SRR6311726_B12TX0967    SRR6311795      SRR6311499_BYU12395       SRR6311615_BYU12399     SRR6311787      SRR6311607_BYU12423     SRR1975549_TX-2094
A02     25483   .       T       C       19376.1 .       AC=69;AF=0.383;AN=180;BaseQRankSum=-1.191;ClippingRankSum=0.000;DP=2119;ExcessHet=-0.0000;FS=6.875;InbreedingCoeff=0.9572;MLEAC=69;MLEAF=0.383;MQ=43.21;MQRankSum=-2.277;QD=34.72;ReadPosRankSum=-1.228;SOR=0.523 GT:AD:DP:GQ:PL  1/1:0,8:8:24:339,24,0   1/1:0,10:10:30:388,30,0   1/1:0,8:8:24:340,24,0   1/1:0,6:6:18:250,18,0   1/1:0,7:7:21:241,21,0   1/1:0,8:8:24:284,24,0   1/1:0,4:4:12:160,12,0   1/1:0,21:21:63:664,63,0   1/1:0,12:12:36:523,36,0 1/1:0,13:13:39:477,39,0 0/0:38,0:38:99:0,99,1457        0/0:45,0:45:99:0,107,1412       0/0:41,0:41:99:0,105,1638         0/0:50,0:50:99:0,115,1730       0/0:35,0:35:90:0,90,1172        0/0:42,0:42:99:0,110,1640       0/0:40,0:40:99:0,99,1367        0/0:43,0:43:99:0,105,1537 0/0:47,0:47:99:0,105,1800       0/0:39,0:39:99:0,100,1462       1/1:0,32:32:96:1130,96,0        0/0:36,0:36:99:0,99,1405        0/0:40,0:40:99:0,102,1529 0/0:44,0:44:99:0,103,1724       0/0:39,0:39:77:0,77,1464        0/0:43,0:43:99:0,101,1617       1/1:0,45:45:99:1628,135,0       0/0:37,0:37:99:0,102,1530 0/0:38,0:38:99:0,102,1497       0/0:37,0:37:99:0,101,1386       0/0:41,0:41:99:0,105,1602       0/0:40,0:40:99:0,100,1477       0/0:37,0:37:99:0,103,1213 0/0:45,0:45:99:0,100,1577       0/0:38,0:38:99:0,100,1345       1/1:0,17:17:51:577,51,0 1/1:0,22:22:66:811,66,0 1/1:0,15:15:45:627,45,0 1/1:0,13:13:39:492,39,0   1/1:0,10:10:30:316,30,0 1/1:0,22:22:66:896,66,0 1/1:0,18:18:54:673,54,0 1/1:0,20:20:60:691,60,0 1/1:0,14:14:42:464,42,0 1/1:0,13:13:39:491,39,0   1/1:0,7:7:21:253,21,0   1/1:0,16:16:48:620,48,0 1/1:0,2:2:6:89,6,0      1/1:0,20:20:60:786,60,0 1/1:0,29:29:87:906,87,0 1/1:0,12:12:36:398,36,0 1/1:0,18:18:54:555,54,0   1/1:0,20:20:60:823,60,0 1/1:0,26:26:78:847,78,0 1/1:0,13:13:39:510,39,0 0/0:21,0:21:57:0,57,855 0/0:16,0:16:48:0,48,566 0/0:19,0:19:51:0,51,765   0/0:16,0:16:42:0,42,630 0/0:20,0:20:60:0,60,631 0/0:19,0:19:57:0,57,577 0/0:16,0:16:48:0,48,520 0/0:20,0:20:51:0,51,765 0/0:21,0:21:57:0,57,855   0/0:21,0:21:60:0,60,615 0/0:17,0:17:48:0,48,720 0/0:22,0:22:60:0,60,900 0/0:19,0:19:54:0,54,810 0/0:27,0:27:72:0,72,1080        0/0:14,0:14:39:0,39,585   0/0:12,0:12:36:0,36,365 0/0:18,0:18:54:0,54,535 0/0:26,0:26:72:0,72,736 0/0:23,0:23:63:0,63,687 0/0:30,0:30:84:0,84,959 0/0:23,0:23:60:0,60,654 0/0:10,0:10:21:0,21,315   0/0:25,0:25:72:0,72,1080        0/0:20,0:20:60:0,60,532 0/0:21,0:21:63:0,63,569 0/0:18,0:18:51:0,51,765 0/0:22,0:22:20:0,20,906 0/1:8,13:21:99:344,0,261  0/0:26,0:26:72:0,72,1070        0/0:17,0:17:48:0,48,720 1/1:0,21:21:63:782,63,0 0/0:16,0:16:48:0,48,664 0/0:20,0:20:60:0,60,743 1/1:0,15:15:45:521,45,0   0/0:7,0:7:0:0,0,224
A02     25507   .       T       C       4747.72 .       AC=15;AF=0.0833;AN=180;BaseQRankSum=-0.667;ClippingRankSum=0.000;DP=2109;ExcessHet=0.0000;FS=0.000;InbreedingCoeff=0.8945;MLEAC=15;MLEAF=0.0833;MQ=43.52;MQRankSum=-0.042;QD=32.97;ReadPosRankSum=0.906;SOR=0.586 GT:AD:DP:GQ:PL  0/0:8,0:8:24:0,24,344   0/0:10,0:10:30:0,30,433   0/0:7,0:7:18:0,18,270   0/0:5,0:5:15:0,15,219   0/0:9,0:9:27:0,27,273   0/0:10,0:10:30:0,30,428 0/0:2,0:2:6:0,6,84      0/0:19,0:19:57:0,57,711   0/0:13,0:13:33:0,33,495 0/0:12,0:12:36:0,36,451 0/0:38,0:38:99:0,99,1457        0/0:45,0:45:99:0,107,1412       0/0:41,0:41:99:0,105,1638         0/0:40,0:40:99:0,100,1405       0/0:35,0:35:99:0,100,1348       0/0:42,0:42:99:0,110,1640       0/0:40,0:40:99:0,99,1367        0/0:43,0:43:99:0,105,1537 0/0:45,0:45:99:0,99,1572        0/0:39,0:39:99:0,100,1462       0/0:34,0:34:90:0,90,1350        0/0:36,0:36:99:0,99,1405        0/0:40,0:40:99:0,102,1529 0/0:44,0:44:99:0,103,1724       0/0:40,0:40:99:0,103,1329       0/0:43,0:43:99:0,101,1617       0/0:43,0:43:99:0,105,1728       0/0:38,0:38:99:0,102,1457 0/0:38,0:38:99:0,102,1497       0/0:37,0:37:99:0,101,1386       0/0:41,0:41:99:0,105,1602       0/0:38,0:38:99:0,100,1289       0/0:37,0:37:99:0,103,1213 0/0:45,0:45:99:0,100,1577       0/0:38,0:38:99:0,100,1345       0/1:8,11:19:99:301,0,193        1/1:0,19:19:57:749,57,0 1/1:0,17:17:51:673,51,0 1/1:0,14:14:42:524,42,0   0/0:6,0:6:18:0,18,266   1/1:0,23:23:69:936,69,0 0/0:19,0:19:57:0,57,712 0/0:19,0:19:54:0,54,810 0/0:13,0:13:39:0,39,469 1/1:0,10:10:30:415,30,0   0/0:10,0:10:30:0,30,369 1/1:0,17:17:51:622,51,0 0/0:2,0:2:6:0,6,73      0/0:26,0:26:36:0,36,946 0/0:25,0:25:72:0,72,953 0/0:18,0:18:51:0,51,765   0/0:21,0:21:60:0,60,726 0/0:23,0:23:66:0,66,990 1/1:0,25:25:75:761,75,0 0/0:11,0:11:33:0,33,489 0/0:21,0:21:60:0,60,900 0/0:18,0:18:48:0,48,720 0/0:15,0:15:42:0,42,630   0/0:21,0:21:51:0,51,765 0/0:21,0:21:57:0,57,855 0/0:19,0:19:54:0,54,810 0/0:17,0:17:48:0,48,720 0/0:17,0:17:51:0,51,540 0/0:20,0:20:57:0,57,855   0/0:20,0:20:60:0,60,601 0/0:12,0:12:33:0,33,495 0/0:18,0:18:51:0,51,765 0/0:20,0:20:60:0,60,699 0/0:27,0:27:75:0,75,1125        0/0:16,0:16:48:0,48,540   0/0:13,0:13:39:0,39,395 0/0:22,0:22:57:0,57,855 0/0:28,0:28:81:0,81,827 0/0:21,0:21:63:0,63,625 0/0:33,0:33:99:0,99,1038        0/0:21,0:21:21:0,21,512   0/0:9,0:9:27:0,27,308   0/0:22,0:22:66:0,66,730 0/0:20,0:20:60:0,60,544 0/0:21,0:21:63:0,63,569 0/0:19,0:19:54:0,54,810 0/0:28,0:28:35:0,35,996   0/0:20,0:20:60:0,60,826 0/0:26,0:26:75:0,75,1125        0/0:15,0:15:45:0,45,601 0/0:15,0:15:42:0,42,630 0/0:20,0:20:60:0,60,798 0/0:20,0:20:57:0,57,855   0/0:14,0:14:42:0,42,586 0/0:8,0:8:21:0,21,315