matted-zz / multipool

High-resolution genetic mapping for pooled sequencing.
http://cgs.csail.mit.edu/multipool/
MIT License
9 stars 12 forks source link

Using VCF file directly #10

Open emmannaemeka opened 6 years ago

emmannaemeka commented 6 years ago

@matted @tw164 Can I use VCF file directly as input file?

tw164 commented 6 years ago

Direct input from a VCF file is not currently implemented. For input to Multipool, pooled variant data in a VCF file must first be converted to an allele counts file in the tabular format described here.

Extracting allele counts/depths from pooled VCF data is not necessarily straightforward, because different variant-calling tools have used different tags to indicate allele depths, and it was only relatively recently, with version 4.3 of the VCF specification, that the 'AD' allele depth field was added as a reserved field.

I have written a prep script (mp_prep.py) that is included in a forked version of this package, which can be found here. This script can be called as follows:

python2 mp_prep.py -p POOL -f FOUNDER1,FOUNDER2 -r chr01 -o allele_counts.txt input.vcf

…where input.vcf is the name of an input VCF file, POOL is the name of a pool sample in the input VCF, FOUNDER1 and FOUNDER2 are the names of founder samples, chr01 is the name of a chromosome for which allele counts are to be generated, and allele_counts.txt is the resulting allele counts file that can be passed as input to the main Multipool script (mp_inference.py).

Note that this prep script expects two founders (as does Multipool), and makes use only of homozygous SNP genotypes that are segregating with respect to those two founders. The prep script also assumes that allele-depth tag sets are one of two types: the 'AO' and 'RO' tags used by FreeBayes, or the 'AD' tag used by GATK and by the VCF specification since version 4.3.

emmannaemeka commented 6 years ago

SNP example.zip

Hello Dr Walsh, Thanks for the detailed description. Please am new to bioinformatics. I have attached an excerpt from the VCF generated using freeBayes. This is for one pool(F1 Population). So from this

how do i fit it into this script python2 mp_prep.py -p POOL -f FOUNDER1,FOUNDER2 -r chr01 -o allele_counts.txt input.vcf

Whats my founders? etc?

Please bear with me.

Thanks

Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications

On Mon, Jan 8, 2018 at 9:07 PM, TAB Walsh notifications@github.com wrote:

Direct input from a VCF file is not currently implemented. For input to Multipool, pooled variant data in a VCF file must first be converted to an allele counts file in the tabular format described here https://github.com/matted/multipool/wiki#user-content-forming-input-data .

Extracting allele counts/depths from pooled VCF data is not necessarily straightforward, because different variant-calling tools have used different tags to indicate allele depths, and it was only relatively recently, with version 4.3 of the VCF specification, that the 'AD' allele depth field was added as a reserved field.

I have written a prep script (mp_prep.py) that is included in a forked version of this package, which can be found here https://github.com/gact/multipool/. This script can be called as follows:

python2 mp_prep.py -p POOL -f FOUNDER1,FOUNDER2 -r chr01 -o allele_counts.txt input.vcf

…where input.vcf is the name of an input VCF file, POOL is the name of a pool sample in the input VCF, FOUNDER1 and FOUNDER2 are the names of founder samples, chr01 is the name of a chromosome for which allele counts are to be generated, and allele_counts.txt is the resulting allele counts file that can be passed as input to the main Multipool script ( mp_inference.py).

Note that this prep script expects two founders (as does Multipool), and makes use only of homozygous SNP genotypes that are segregating with respect to those two founders. The prep script also assumes that allele-depth tag sets are one of two types: the 'AO' and 'RO' tags used by FreeBayes, or the 'AD' tag used by GATK and by the VCF specification since version 4.3.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/matted/multipool/issues/10#issuecomment-356080812, or mute the thread https://github.com/notifications/unsubscribe-auth/Aeq253GWPHXlLpW3LUAJ8EZlUN46FzDXks5tInVqgaJpZM4RWpA2 .

emmannaemeka commented 6 years ago

Error with python2 mp_prep.py

multipool-master emmannaemeka$ python2 mp_prep.py -p POOL -f FOUNDER1,FOUNDER2 -r chr01 -o allele_counts.txt input.vcf Traceback (most recent call last): File "mp_prep.py", line 30, in import pysam ImportError: No module named pysam

tw164 commented 6 years ago

Hello Emmanuel,

The Python error occurs because the PySAM Python module couldn't be found. The PySAM module is used by mp_prep.py for accessing VCF input files. PySAM can be installed by following the instructions here.

For preparing Multipool input, you will need SNP genotype data for the pool sample (named POOL in the example command), as well as SNP genotypes for the founders (in the case of an F1 population, these are the parents). These are used by mp_prep.py to encode allele depth in terms of parental alleles.

If you would be so kind as to paste an excerpt from your VCF in plain text, I would be happy to have a look at it. Anything from the body of the VCF file — lines not begin with a number sign (#) — would be great.

Regards,

Thomas.

emmannaemeka commented 6 years ago

@tw164 Please find an excerpt of the vcd file from one of the pool CHROM POS ID REF ALT QUAL FILTER INFO FORMAT unknown cneoH99_Chr1 99 . T C 38619.9 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1081;CIGAR=1X;DP=1081;DPB=1081;DPRA=0;EPP=4.26578;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=1503.19;PAIRED=0.93247;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=43044;QR=0;RO=0;RPL=490;RPP=23.5017;RPPR=0;RPR=591;RUN=1;SAF=569;SAP=9.53677;SAR=512;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1081:1081,1081:0:0:1081:43044:-3869.66,-325.413,0 cneoH99_Chr1 161 . C G 39966.6 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1125;CIGAR=1X;DP=1126;DPB=1126;DPRA=0;EPP=3.86152;EPPR=0;GTI=0;LEN=1;MEANALT=2;MQM=59.9956;MQMR=0;NS=1;NUMALT=1;ODDS=1564.19;PAIRED=0.962667;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=44520;QR=0;RO=0;RPL=583;RPP=6.25496;RPPR=0;RPR=542;RUN=1;SAF=585;SAP=6.91895;SAR=540;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1126:1126,1125:0:0:1125:44520:-4002.33,-338.659,0 cneoH99_Chr1 168 . T C 38630.6 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1083;CIGAR=1X;DP=1084;DPB=1084;DPRA=0;EPP=7.07053;EPPR=0;GTI=0;LEN=1;MEANALT=2;MQM=59.9815;MQMR=0;NS=1;NUMALT=1;ODDS=1505.96;PAIRED=0.962142;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=43051;QR=0;RO=0;RPL=578;RPP=13.6952;RPPR=0;RPR=505;RUN=1;SAF=567;SAP=8.22544;SAR=516;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1084:1084,1083:0:0:1083:43051:-3869.9,-326.015,0 cneoH99_Chr1 222 . G A 18119.4 . AB=0.5884;ABP=75.5708;AC=1;AF=0.5;AN=2;AO=629;CIGAR=1X;DP=1069;DPB=1069;DPRA=0;EPP=3.59373;EPPR=4.61659;GTI=0;LEN=1;MEANALT=3;MQM=60;MQMR=59.968;NS=1;NUMALT=1;ODDS=2804.58;PAIRED=0.980922;PAIREDR=0.977169;PAO=0;PQA=0;PQR=0;PRO=0;QA=23902;QR=17295;RO=438;RPL=332;RPP=7.23932;RPPR=4.99338;RPR=297;RUN=1;SAF=317;SAP=3.09661;SAR=312;SRF=221;SRP=3.08962;SRR=217;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1069:1069,629:438:17295:629:23902:-1827.99,0,-1233.71 cneoH99_Chr1 262 . TGCAT CGCAA 36434.3 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=998;CIGAR=1X3M1X;DP=999;DPB=1029.6;DPRA=0;EPP=4.0634;EPPR=0;GTI=0;LEN=5;MEANALT=2;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=1430.41;PAIRED=0.980962;PAIREDR=0;PAO=61;PQA=2147;PQR=0;PRO=0;QA=38434;QR=0;RO=0;RPL=508;RPP=3.71527;RPPR=0;RPR=490;RUN=1;SAF=496;SAP=3.08863;SAR=502;SRF=0;SRP=0;SRR=0;TYPE=complex GT:DP:DPR:RO:QR:AO:QA:GL 1/1:999:999,998:0:0:998:38434:-3648.41,-318.791,0 cneoH99_Chr1 308 . G A 37370.8 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1068;CIGAR=1X;DP=1070;DPB=1070;DPRA=0;EPP=3.14043;EPPR=7.35324;GTI=0;LEN=1;MEANALT=1;MQM=59.9494;MQMR=60;NS=1;NUMALT=1;ODDS=1471.41;PAIRED=0.978464;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=41645;QR=34;RO=2;RPL=531;RPP=3.0835;RPPR=7.35324;RPR=537;RUN=1;SAF=547;SAP=4.38475;SAR=521;SRF=0;SRP=7.35324;SRR=2;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1070:1070,1068:2:34:1068:41645:-3740.27,-318.872,0 cneoH99_Chr1 365 . T A 38767.3 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1085;CIGAR=1X;DP=1085;DPB=1085;DPRA=0;EPP=3.58869;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=59.9908;MQMR=0;NS=1;NUMALT=1;ODDS=1508.73;PAIRED=0.980645;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=43183;QR=0;RO=0;RPL=552;RPP=3.73279;RPPR=0;RPR=533;RUN=1;SAF=566;SAP=7.4313;SAR=519;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1085:1085,1085:0:0:1085:43183:-3882.14,-326.618,0 cneoH99_Chr1 388 . C T 14058.8 . AB=0.439531;ABP=38.2007;AC=1;AF=0.5;AN=2;AO=487;CIGAR=1X;DP=1108;DPB=1108;DPRA=0;EPP=3.22878;EPPR=3.06634;GTI=0;LEN=1;MEANALT=2;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=3237.15;PAIRED=0.98768;PAIREDR=0.979032;PAO=0;PQA=0;PQR=0;PRO=0;QA=19484;QR=24802;RO=620;RPL=222;RPP=11.2548;RPPR=9.7909;RPR=265;RUN=1;SAF=247;SAP=3.22878;SAR=240;SRF=324;SRP=5.75616;SRR=296;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1108:1108,487:620:24802:487:19484:-1418.16,0,-1896.23 cneoH99_Chr1 416 . G A,T 36440.4 . AB=0.445837,0.554163;ABP=31.4727,31.4727;AC=1,1;AF=0.5,0.5;AN=2;AO=498,619;CIGAR=1X,1X;DP=1117;DPB=1117;DPRA=0,0;EPP=3.86494,5.20282;EPPR=0;GTI=0;LEN=1,1;MEANALT=2,2;MQM=59.992,59.9677;MQMR=0;NS=1;NUMALT=2;ODDS=3337.04;PAIRED=0.987952,0.975767;PAIREDR=0;PAO=0,0;PQA=0,0;PQR=0;PRO=0;QA=19988,24471;QR=0;RO=0;RPL=233,316;RPP=7.47534,3.60316;RPPR=0;RPR=265,303;RUN=1,1;SAF=261,321;SAP=5.52188,4.86605;SAR=237,298;SRF=0;SRP=0;SRR=0;TYPE=snp,snp GT:DP:DPR:RO:QR:AO:QA:GL 1/2:1117:1117,498,619:0:0:498,619:19988,24471:-3660.58,-2013.79,-1863.87,-1647.19,0,-1460.85 cneoH99_Chr1 456 . G C 40779.5 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1147;CIGAR=1X;DP=1149;DPB=1149;DPRA=0;EPP=7.19232;EPPR=7.35324;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=1581.58;PAIRED=0.986922;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=45433;QR=24;RO=2;RPL=582;RPP=3.55743;RPPR=3.0103;RPR=565;RUN=1;SAF=584;SAP=3.84519;SAR=563;SRF=1;SRP=3.0103;SRR=1;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1149:1149,1147:2:24:1147:45433:-4082.18,-343.603,0 cneoH99_Chr1 1027 . C T 19031.6 . AB=0.588571;ABP=74.5572;AC=1;AF=0.5;AN=2;AO=618;CIGAR=1X;DP=1050;DPB=1050;DPRA=0;EPP=3.69899;EPPR=8.15749;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=59.9676;NS=1;NUMALT=1;ODDS=2807.59;PAIRED=0.990291;PAIREDR=0.986111;PAO=0;PQA=0;PQR=0;PRO=0;QA=24857;QR=17254;RO=432;RPL=299;RPP=4.41578;RPPR=4.6389;RPR=319;RUN=1;SAF=315;SAP=3.51627;SAR=303;SRF=227;SRP=5.44315;SRR=205;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1050:1050,618:432:17254:618:24857:-1918.69,0,-1235.28 cneoH99_Chr1 1182 . GAC AAT 19342.3 . AB=0.596459;ABP=89.7255;AC=1;AF=0.5;AN=2;AO=640;CIGAR=1X1M1X;DP=1073;DPB=1081.33;DPRA=0;EPP=3.22745;EPPR=3.85783;GTI=0;LEN=3;MEANALT=1;MQM=59.4688;MQMR=59.9515;NS=1;NUMALT=1;ODDS=2772.33;PAIRED=0.982812;PAIREDR=0.986143;PAO=10;PQA=329;PQR=98;PRO=6;QA=25164;QR=17126;RO=433;RPL=323;RPP=3.13245;RPPR=3.13567;RPR=317;RUN=1;SAF=319;SAP=3.02387;SAR=321;SRF=205;SRP=5.66321;SRR=228;TYPE=complex GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1073:1073,640:433:17126:640:25164:-1951.1,0,-1220.95 cneoH99_Chr1 1443 . C T 20475.9 . AB=0.604464;ABP=109.172;AC=1;AF=0.5;AN=2;AO=677;CIGAR=1X;DP=1120;DPB=1120;DPRA=0;EPP=3.27011;EPPR=3.32472;GTI=0;LEN=1;MEANALT=2;MQM=59.6499;MQMR=59.9208;NS=1;NUMALT=1;ODDS=2850.15;PAIRED=0.977843;PAIREDR=0.984163;PAO=0;PQA=0;PQR=0;PRO=0;QA=26804;QR=17740;RO=442;RPL=297;RPP=25.1067;RPPR=3.18716;RPR=380;RUN=1;SAF=344;SAP=3.39841;SAR=333;SRF=219;SRP=3.08891;SRR=223;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1120:1120,677:442:17740:677:26804:-2068.82,0,-1256.53 cneoH99_Chr1 1797 . G T 22115 . AB=0.59851;ABP=104.832;AC=1;AF=0.5;AN=2;AO=723;CIGAR=1X;DP=1208;DPB=1208;DPRA=0;EPP=3.03733;EPPR=7.60449;GTI=0;LEN=1;MEANALT=2;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=3128.04;PAIRED=0.970954;PAIREDR=0.987603;PAO=0;PQA=0;PQR=0;PRO=0;QA=28870;QR=19366;RO=484;RPL=329;RPP=15.6997;RPPR=3.29744;RPR=394;RUN=1;SAF=385;SAP=9.64485;SAR=338;SRF=256;SRP=6.52773;SRR=228;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1208:1208,723:484:19366:723:28870:-2231.99,0,-1377.62 cneoH99_Chr1 1899 . A T 4195.65 . AB=0.213978;ABP=826.575;AC=1;AF=0.5;AN=2;AO=248;CIGAR=1X;DP=1159;DPB=1159;DPRA=0;EPP=3.88589;EPPR=8.73336;GTI=0;LEN=1;MEANALT=1;MQM=58.2339;MQMR=58.1471;NS=1;NUMALT=1;ODDS=966.083;PAIRED=0.979839;PAIREDR=0.980241;PAO=0;PQA=0;PQR=0;PRO=0;QA=9793;QR=36124;RO=911;RPL=121;RPP=3.32551;RPPR=17.1427;RPR=127;RUN=1;SAF=124;SAP=3.0103;SAR=124;SRF=465;SRP=3.87078;SRR=446;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1159:1159,248:911:36124:248:9793:-514.681,0,-2847.69 cneoH99_Chr1 2033 . C T 3953.41 . AB=0.58042;ABP=19.0762;AC=1;AF=0.5;AN=2;AO=166;CIGAR=1X;DP=286;DPB=286;DPRA=0;EPP=38.3818;EPPR=11.7686;GTI=0;LEN=1;MEANALT=1;MQM=39.3735;MQMR=37.3833;NS=1;NUMALT=1;ODDS=578.344;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=6574;QR=4806;RO=120;RPL=121;RPP=78.5671;RPPR=55.7771;RPR=45;RUN=1;SAF=12;SAP=266.779;SAR=154;SRF=16;SRP=143.143;SRR=104;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:286:286,166:120:4806:166:6574:-446.248,0,-281.31 cneoH99_Chr1 6368 . T A 5.93664e-13 . AB=0;ABP=0;AC=0;AF=0;AN=2;AO=30;CIGAR=1X;DP=118;DPB=118;DPRA=0;EPP=3.0103;EPPR=5.03202;GTI=0;LEN=1;MEANALT=2;MQM=15;MQMR=30.4713;NS=1;NUMALT=1;ODDS=29.6209;PAIRED=0.633333;PAIREDR=0.988506;PAO=0;PQA=0;PQR=0;PRO=0;QA=1178;QR=3517;RO=87;RPL=18;RPP=5.61607;RPPR=14.0174;RPR=12;RUN=1;SAF=13;SAP=4.16842;SAR=17;SRF=43;SRP=3.03526;SRR=44;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/0:118:118,30:87:3517:30:1178:0,-0.608226,-135.641 cneoH99_Chr1 10411 . A G 77.3512 . AB=0.358974;ABP=9.74743;AC=1;AF=0.5;AN=2;AO=14;CIGAR=1X;DP=39;DPB=39;DPRA=0;EPP=33.4109;EPPR=7.26639;GTI=0;LEN=1;MEANALT=1;MQM=29.6429;MQMR=59.12;NS=1;NUMALT=1;ODDS=17.8108;PAIRED=0.928571;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=552;QR=1008;RO=25;RPL=0;RPP=33.4109;RPPR=3.79203;RPR=14;RUN=1;SAF=14;SAP=33.4109;SAR=0;SRF=9;SRP=7.26639;SRR=16;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:39:39,14:25:1008:14:552:-23.94,0,-79.1647 cneoH99_Chr1 10423 . C T 214.246 . AB=0.414634;ABP=5.60547;AC=1;AF=0.5;AN=2;AO=17;CIGAR=1X;DP=41;DPB=41;DPRA=0;EPP=18.4661;EPPR=4.45795;GTI=0;LEN=1;MEANALT=1;MQM=33.7059;MQMR=59.0833;NS=1;NUMALT=1;ODDS=49.332;PAIRED=0.941176;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=688;QR=980;RO=24;RPL=0;RPP=39.9253;RPPR=3.37221;RPR=17;RUN=1;SAF=14;SAP=18.4661;SAR=3;SRF=9;SRP=6.26751;SRR=15;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:41:41,17:24:980:17:688:-34.2737,0,-76.0455 cneoH99_Chr1 10434 . C G 382.039 . AB=0.48;ABP=3.18402;AC=1;AF=0.5;AN=2;AO=24;CIGAR=1X;DP=50;DPB=50;DPRA=0;EPP=8.80089;EPPR=6.01695;GTI=0;LEN=1;MEANALT=1;MQM=35.5833;MQMR=59.1538;NS=1;NUMALT=1;ODDS=87.9677;PAIRED=0.875;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=971;QR=1058;RO=26;RPL=0;RPP=55.1256;RPPR=6.01695;RPR=24;RUN=1;SAF=16;SAP=8.80089;SAR=8;SRF=10;SRP=6.01695;SRR=16;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:50:50,24:26:1058:24:971:-51.1117,0,-80.348 cneoH99_Chr1 10471 . C T 570.813 . AB=0.517241;ABP=3.16006;AC=1;AF=0.5;AN=2;AO=30;CIGAR=1X;DP=58;DPB=58;DPRA=0;EPP=17.1973;EPPR=3.0103;GTI=0;LEN=1;MEANALT=1;MQM=34.9333;MQMR=51;NS=1;NUMALT=1;ODDS=131.435;PAIRED=0.9;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=1209;QR=1126;RO=28;RPL=8;RPP=17.1973;RPPR=3.0103;RPR=22;RUN=1;SAF=16;SAP=3.29983;SAR=14;SRF=12;SRP=4.25114;SRR=16;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:58:58,30:28:1126:30:1209:-65.473,0,-73.6021 cneoH99_Chr1 10482 . T C 566.878 . AB=0.563636;ABP=4.94488;AC=1;AF=0.5;AN=2;AO=31;CIGAR=1X;DP=55;DPB=55;DPRA=0;EPP=40.0654;EPPR=3.37221;GTI=0;LEN=1;MEANALT=1;MQM=33.9032;MQMR=49.5;NS=1;NUMALT=1;ODDS=120.199;PAIRED=0.903226;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=1257;QR=942;RO=24;RPL=12;RPP=6.44263;RPPR=3.37221;RPR=19;RUN=1;SAF=16;SAP=3.08035;SAR=15;SRF=10;SRP=4.45795;SRR=14;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:55:55,31:24:942:31:1257:-66.719,0,-57.9497 cneoH99_Chr1 10536 . C T 527.354 . AB=0.682927;ABP=14.9269;AC=1;AF=0.5;AN=2;AO=28;CIGAR=1X;DP=41;DPB=41;DPRA=0;EPP=3.0103;EPPR=3.17734;GTI=0;LEN=1;MEANALT=1;MQM=34.7857;MQMR=42.3077;NS=1;NUMALT=1;ODDS=37.0965;PAIRED=0.892857;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=1139;QR=533;RO=13;RPL=26;RPP=47.6806;RPPR=31.2394;RPR=2;RUN=1;SAF=12;SAP=4.25114;SAR=16;SRF=6;SRP=3.17734;SRR=7;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:41:41,28:13:533:28:1139:-64.4541,0,-25.4782 cneoH99_Chr1 10545 . C T 453.155 . AB=0.631579;ABP=8.7247;AC=1;AF=0.5;AN=2;AO=24;CIGAR=1X;DP=38;DPB=38;DPRA=0;EPP=4.45795;EPPR=3.0103;GTI=0;LEN=1;MEANALT=1;MQM=36;MQMR=42.1429;NS=1;NUMALT=1;ODDS=48.9657;PAIRED=0.875;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=976;QR=565;RO=14;RPL=22;RPP=39.2015;RPPR=25.3454;RPR=2;RUN=1;SAF=8;SAP=8.80089;SAR=16;SRF=6;SRP=3.63072;SRR=8;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:38:38,24:14:565:24:976:-55.8434,0,-29.2033 cneoH99_Chr1 12362 . TCCTCAT ATGTCAC 3.47052e-06 . AB=0;ABP=0;AC=0;AF=0;AN=2;AO=3;CIGAR=3X3M1X;DP=13;DPB=13.8571;DPRA=0;EPP=9.52472;EPPR=22.5536;GTI=0;LEN=7;MEANALT=2;MQM=14.6667;MQMR=29.3333;NS=1;NUMALT=1;ODDS=14.0398;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=36;PRO=1;QA=94;QR=365;RO=9;RPL=0;RPP=9.52472;RPPR=3.25157;RPR=3;RUN=1;SAF=3;SAP=9.52472;SAR=0;SRF=4;SRP=3.25157;SRR=5;TYPE=complex GT:DP:DPR:RO:QR:AO:QA:GL 0/0:13:13,3:9:365:3:94:0,-0.118182,-21.5253 cneoH99_Chr1 12386 . CGA TGT 1.06417 . AB=0.411765;ABP=4.1599;AC=1;AF=0.5;AN=2;AO=7;CIGAR=1X1M1X;DP=17;DPB=17;DPRA=0;EPP=10.7656;EPPR=5.80219;GTI=0;LEN=3;MEANALT=2;MQM=17.4286;MQMR=30;NS=1;NUMALT=1;ODDS=1.28134;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=277;QR=287;RO=7;RPL=1;RPP=10.7656;RPPR=5.80219;RPR=6;RUN=1;SAF=7;SAP=18.2106;SAR=0;SRF=4;SRP=3.32051;SRR=3;TYPE=complex GT:DP:DPR:RO:QR:AO:QA:GL 0/1:17:17,7:7:287:7:277:-6.75383,0,-14.0539 cneoH99_Chr1 12441 . A G 164.083 . AB=0.7;ABP=13.4334;AC=1;AF=0.5;AN=2;AO=21;CIGAR=1X;DP=30;DPB=30;DPRA=0;EPP=15.5221;EPPR=3.25157;GTI=0;LEN=1;MEANALT=1;MQM=22.5238;MQMR=28.7778;NS=1;NUMALT=1;ODDS=15.535;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=804;QR=369;RO=9;RPL=5;RPP=15.5221;RPPR=3.25157;RPR=16;RUN=1;SAF=21;SAP=48.6112;SAR=0;SRF=8;SRP=14.8328;SRR=1;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:30:30,21:9:369:21:804:-30.6314,0,-13.5949 cneoH99_Chr1 12448 . GAGGT CAGGC 253.019 . AB=0.647059;ABP=9.39698;AC=1;AF=0.5;AN=2;AO=22;CIGAR=1X3M1X;DP=34;DPB=35.4;DPRA=0;EPP=17.2236;EPPR=3.20771;GTI=0;LEN=5;MEANALT=2;MQM=24.4545;MQMR=32.2727;NS=1;NUMALT=1;ODDS=31.1268;PAIRED=1;PAIREDR=1;PAO=2;PQA=73;PQR=33;PRO=1;QA=885;QR=429;RO=11;RPL=4;RPP=22.3561;RPPR=4.78696;RPR=18;RUN=1;SAF=21;SAP=42.4916;SAR=1;SRF=10;SRP=19.0002;SRR=1;TYPE=complex GT:DP:DPR:RO:QR:AO:QA:GL 0/1:34:34,22:11:429:22:885:-39.1899,0,-20.8749

emmannaemeka commented 6 years ago
@tw164 CHROM POS ID REF ALT QUAL FILTER INFO FORMAT unknown
cneoH99_Chr1 99 . T C 50426.3 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1415;CIGAR=1X;DP=1415;DPB=1415;DPRA=0;EPP=6.6949;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=59.983;MQMR=0;NS=1;NUMALT=1;ODDS=1966.21;PAIRED=0.928622;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=56243;QR=0;RO=0;RPL=619;RPP=51.0881;RPPR=0;RPR=796;RUN=1;SAF=781;SAP=36.1717;SAR=634;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1081:1081,1081:0:0:1081:43044:-3869.66,-325.413,0
cneoH99_Chr1 161 . C G 52460.3 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1487;CIGAR=1X;DP=1487;DPB=1487;DPRA=0;EPP=21.0027;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=2066.02;PAIRED=0.958978;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=58459;QR=0;RO=0;RPL=795;RPP=18.5027;RPPR=0;RPR=692;RUN=1;SAF=787;SAP=14.0633;SAR=700;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1126:1126,1125:0:0:1125:44520:-4002.33,-338.659,0
cneoH99_Chr1 168 . T C 53275.2 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1494;CIGAR=1X;DP=1496;DPB=1496;DPRA=0;EPP=38.3818;EPPR=7.35324;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=2059.11;PAIRED=0.959839;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=59417;QR=44;RO=2;RPL=811;RPP=26.8238;RPPR=7.35324;RPR=683;RUN=1;SAF=780;SAP=9.34158;SAR=714;SRF=0;SRP=7.35324;SRR=2;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1084:1084,1083:0:0:1083:43051:-3869.9,-326.015,0
cneoH99_Chr1 222 . G A 24631.6 . AB=0.595788;ABP=120.323;AC=1;AF=0.5;AN=2;AO=877;CIGAR=1X;DP=1472;DPB=1472;DPRA=0;EPP=5.38976;EPPR=3.83144;GTI=0;LEN=1;MEANALT=1;MQM=59.9852;MQMR=59.9866;NS=1;NUMALT=1;ODDS=3791.54;PAIRED=0.973774;PAIREDR=0.971429;PAO=0;PQA=0;PQR=0;PRO=0;QA=32548;QR=23470;RO=595;RPL=438;RPP=3.01278;RPPR=3.18913;RPR=439;RUN=1;SAF=435;SAP=3.13163;SAR=442;SRF=281;SRP=6.98464;SRR=314;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1069:1069,629:438:17295:629:23902:-1827.99,0,-1233.71
cneoH99_Chr1 262 . TGCAT CGCAA 48506.1 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1334;CIGAR=1X3M1X;DP=1335;DPB=1373;DPRA=0;EPP=3.5377;EPPR=0;GTI=0;LEN=5;MEANALT=2;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=1908.68;PAIRED=0.976012;PAIREDR=0;PAO=79;PQA=2757;PQR=0;PRO=0;QA=51256;QR=0;RO=0;RPL=667;RPP=3.0103;RPPR=0;RPR=667;RUN=1;SAF=637;SAP=8.87035;SAR=697;SRF=0;SRP=0;SRR=0;TYPE=complex GT:DP:DPR:RO:QR:AO:QA:GL 1/1:999:999,998:0:0:998:38434:-3648.41,-318.791,0
cneoH99_Chr1 308 . G A 50557.7 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1445;CIGAR=1X;DP=1447;DPB=1447;DPRA=0;EPP=4.1058;EPPR=5.18177;GTI=0;LEN=1;MEANALT=2;MQM=59.9896;MQMR=60;NS=1;NUMALT=1;ODDS=1998.22;PAIRED=0.978547;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=56314;QR=27;RO=1;RPL=740;RPP=4.85117;RPPR=5.18177;RPR=705;RUN=1;SAF=724;SAP=3.02382;SAR=721;SRF=1;SRP=5.18177;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1070:1070,1068:2:34:1068:41645:-3740.27,-318.872,0
cneoH99_Chr1 365 . T A 52245 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1456;CIGAR=1X;DP=1458;DPB=1458;DPRA=0;EPP=3.60686;EPPR=0;GTI=0;LEN=1;MEANALT=2;MQM=59.9794;MQMR=0;NS=1;NUMALT=1;ODDS=2023.05;PAIRED=0.980769;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=58182;QR=0;RO=0;RPL=715;RPP=4.01848;RPPR=0;RPR=741;RUN=1;SAF=759;SAP=8.74323;SAR=697;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1085:1085,1085:0:0:1085:43183:-3882.14,-326.618,0
cneoH99_Chr1 388 . C T 16783.2 . AB=0.390992;ABP=161.131;AC=1;AF=0.5;AN=2;AO=599;CIGAR=1X;DP=1532;DPB=1532;DPRA=0;EPP=3.44894;EPPR=4.96765;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=3864.49;PAIRED=0.986644;PAIREDR=0.982851;PAO=0;PQA=0;PQR=0;PRO=0;QA=24072;QR=37240;RO=933;RPL=280;RPP=8.52417;RPPR=15.4131;RPR=319;RUN=1;SAF=305;SAP=3.44894;SAR=294;SRF=499;SRP=12.8436;SRR=434;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1108:1108,487:620:24802:487:19484:-1418.16,0,-1896.23
cneoH99_Chr1 416 . G A,T 49899 . AB=0.374432,0.624919;ABP=214.055,211.879;AC=1,1;AF=0.5,0.5;AN=2;AO=577,963;CIGAR=1X,1X;DP=1541;DPB=1541;DPRA=0,0;EPP=4.66995,3.06667;EPPR=5.18177;GTI=0;LEN=1,1;MEANALT=2,2;MQM=60,59.9792;MQMR=60;NS=1;NUMALT=2;ODDS=3625;PAIRED=0.989601,0.983385;PAIREDR=1;PAO=0,0;PQA=0,0;PQR=0;PRO=0;QA=22998,38075;QR=12;RO=1;RPL=292,496;RPP=3.19471,4.90667;RPPR=5.18177;RPR=285,467;RUN=1,1;SAF=297,526;SAP=4.09792,20.8714;SAR=280,437;SRF=0;SRP=5.18177;SRR=1;TYPE=snp,snp GT:DP:DPR:RO:QR:AO:QA:GL 1/2:1117:1117,498,619:0:0:498,619:19988,24471:-3660.58,-2013.79,-1863.87,-1647.19,0,-1460.85
cneoH99_Chr1 456 . G C 54969.4 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1542;CIGAR=1X;DP=1544;DPB=1544;DPRA=0;EPP=8.77836;EPPR=0;GTI=0;LEN=1;MEANALT=3;MQM=59.9994;MQMR=0;NS=1;NUMALT=1;ODDS=2142.27;PAIRED=0.978599;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=61222;QR=0;RO=0;RPL=785;RPP=4.11434;RPPR=0;RPR=757;RUN=1;SAF=808;SAP=10.7217;SAR=734;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1149:1149,1147:2:24:1147:45433:-4082.18,-343.603,0
cneoH99_Chr1 1027 . C T 25058.3 . AB=0.549704;ABP=35.6053;AC=1;AF=0.5;AN=2;AO=835;CIGAR=1X;DP=1519;DPB=1519;DPRA=0;EPP=11.4595;EPPR=6.26116;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=4568.61;PAIRED=0.978443;PAIREDR=0.980994;PAO=0;PQA=0;PQR=0;PRO=0;QA=33097;QR=27293;RO=684;RPL=416;RPP=3.03371;RPPR=6.68022;RPR=419;RUN=1;SAF=416;SAP=3.03371;SAR=419;SRF=355;SRP=5.15638;SRR=329;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1050:1050,618:432:17254:618:24857:-1918.69,0,-1235.28
cneoH99_Chr1 1182 . GAC AAT 22800.7 . AB=0.533333;ABP=16.6182;AC=1;AF=0.5;AN=2;AO=752;CIGAR=1X1M1X;DP=1410;DPB=1434;DPRA=0;EPP=9.66332;EPPR=4.47231;GTI=0;LEN=3;MEANALT=3;MQM=59.6782;MQMR=59.9878;NS=1;NUMALT=1;ODDS=4339.67;PAIRED=0.970745;PAIREDR=0.984733;PAO=30;PQA=1042;PQR=346;PRO=15;QA=29305;QR=25592;RO=655;RPL=371;RPP=3.29906;RPPR=6.19623;RPR=381;RUN=1;SAF=379;SAP=3.11425;SAR=373;SRF=334;SRP=3.57057;SRR=321;TYPE=complex GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1073:1073,640:433:17126:640:25164:-1951.1,0,-1220.95
cneoH99_Chr1 1443 . C T 25570.5 . AB=0.572003;ABP=70.2416;AC=1;AF=0.5;AN=2;AO=854;CIGAR=1X;DP=1493;DPB=1493;DPRA=0;EPP=5.00378;EPPR=3.28556;GTI=0;LEN=1;MEANALT=1;MQM=59.6417;MQMR=60;NS=1;NUMALT=1;ODDS=4186.99;PAIRED=0.982436;PAIREDR=0.976526;PAO=0;PQA=0;PQR=0;PRO=0;QA=33746;QR=25391;RO=639;RPL=369;RPP=37.225;RPPR=3.0137;RPR=485;RUN=1;SAF=436;SAP=3.83414;SAR=418;SRF=314;SRP=3.42149;SRR=325;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1120:1120,677:442:17740:677:26804:-2068.82,0,-1256.53
cneoH99_Chr1 1797 . G T 24735.6 . AB=0.542857;ABP=27.0206;AC=1;AF=0.5;AN=2;AO=817;CIGAR=1X;DP=1505;DPB=1505;DPRA=0;EPP=4.94788;EPPR=3.08932;GTI=0;LEN=1;MEANALT=2;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=4571.08;PAIRED=0.982864;PAIREDR=0.975255;PAO=0;PQA=0;PQR=0;PRO=0;QA=32692;QR=27252;RO=687;RPL=383;RPP=9.9234;RPPR=14.7717;RPR=434;RUN=1;SAF=436;SAP=11.0503;SAR=381;SRF=352;SRP=3.92377;SRR=335;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1208:1208,723:484:19366:723:28870:-2231.99,0,-1377.62
cneoH99_Chr1 1899 . A T 4195.65 . AB=0.213978;ABP=826.575;AC=1;AF=0.5;AN=2;AO=248;CIGAR=1X;DP=1159;DPB=1159;DPRA=0;EPP=3.88589;EPPR=8.73336;GTI=0;LEN=1;MEANALT=1;MQM=58.2339;MQMR=58.1471;NS=1;NUMALT=1;ODDS=966.083;PAIRED=0.979839;PAIREDR=0.980241;PAO=0;PQA=0;PQR=0;PRO=0;QA=9793;QR=36124;RO=911;RPL=121;RPP=3.32551;RPPR=17.1427;RPR=127;RUN=1;SAF=124;SAP=3.0103;SAR=124;SRF=465;SRP=3.87078;SRR=446;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:1159:1159,248:911:36124:248:9793:-514.681,0,-2847.69
cneoH99_Chr1 2033 . C T 4293.25 . AB=0.538244;ABP=7.49473;AC=1;AF=0.5;AN=2;AO=190;CIGAR=1X;DP=353;DPB=353;DPRA=0;EPP=19.5135;EPPR=11.3365;GTI=0;LEN=1;MEANALT=1;MQM=37.9474;MQMR=38.7914;NS=1;NUMALT=1;ODDS=847.931;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=7544;QR=6463;RO=163;RPL=126;RPP=46.9426;RPPR=52.5812;RPR=64;RUN=1;SAF=12;SAP=317.942;SAR=178;SRF=18;SRP=217.879;SRR=145;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:286:286,166:120:4806:166:6574:-446.248,0,-281.31
cneoH99_Chr1 6368 . T A 5.93664e-13 . AB=0;ABP=0;AC=0;AF=0;AN=2;AO=30;CIGAR=1X;DP=118;DPB=118;DPRA=0;EPP=3.0103;EPPR=5.03202;GTI=0;LEN=1;MEANALT=2;MQM=15;MQMR=30.4713;NS=1;NUMALT=1;ODDS=29.6209;PAIRED=0.633333;PAIREDR=0.988506;PAO=0;PQA=0;PQR=0;PRO=0;QA=1178;QR=3517;RO=87;RPL=18;RPP=5.61607;RPPR=14.0174;RPR=12;RUN=1;SAF=13;SAP=4.16842;SAR=17;SRF=43;SRP=3.03526;SRR=44;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/0:118:118,30:87:3517:30:1178:0,-0.608226,-135.641
cneoH99_Chr1 10411 . A G 77.3512 . AB=0.358974;ABP=9.74743;AC=1;AF=0.5;AN=2;AO=14;CIGAR=1X;DP=39;DPB=39;DPRA=0;EPP=33.4109;EPPR=7.26639;GTI=0;LEN=1;MEANALT=1;MQM=29.6429;MQMR=59.12;NS=1;NUMALT=1;ODDS=17.8108;PAIRED=0.928571;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=552;QR=1008;RO=25;RPL=0;RPP=33.4109;RPPR=3.79203;RPR=14;RUN=1;SAF=14;SAP=33.4109;SAR=0;SRF=9;SRP=7.26639;SRR=16;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:39:39,14:25:1008:14:552:-23.94,0,-79.1647
cneoH99_Chr1 10423 . C T 214.246 . AB=0.414634;ABP=5.60547;AC=1;AF=0.5;AN=2;AO=17;CIGAR=1X;DP=41;DPB=41;DPRA=0;EPP=18.4661;EPPR=4.45795;GTI=0;LEN=1;MEANALT=1;MQM=33.7059;MQMR=59.0833;NS=1;NUMALT=1;ODDS=49.332;PAIRED=0.941176;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=688;QR=980;RO=24;RPL=0;RPP=39.9253;RPPR=3.37221;RPR=17;RUN=1;SAF=14;SAP=18.4661;SAR=3;SRF=9;SRP=6.26751;SRR=15;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:41:41,17:24:980:17:688:-34.2737,0,-76.0455
cneoH99_Chr1 10434 . C G 382.039 . AB=0.48;ABP=3.18402;AC=1;AF=0.5;AN=2;AO=24;CIGAR=1X;DP=50;DPB=50;DPRA=0;EPP=8.80089;EPPR=6.01695;GTI=0;LEN=1;MEANALT=1;MQM=35.5833;MQMR=59.1538;NS=1;NUMALT=1;ODDS=87.9677;PAIRED=0.875;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=971;QR=1058;RO=26;RPL=0;RPP=55.1256;RPPR=6.01695;RPR=24;RUN=1;SAF=16;SAP=8.80089;SAR=8;SRF=10;SRP=6.01695;SRR=16;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:50:50,24:26:1058:24:971:-51.1117,0,-80.348
cneoH99_Chr1 10471 . C T 218.71 . AB=0.203593;ABP=130.451;AC=1;AF=0.5;AN=2;AO=34;CIGAR=1X;DP=167;DPB=167;DPRA=0;EPP=7.09778;EPPR=4.98585;GTI=0;LEN=1;MEANALT=1;MQM=33.2353;MQMR=55.3158;NS=1;NUMALT=1;ODDS=50.3597;PAIRED=1;PAIREDR=0.977444;PAO=0;PQA=0;PQR=0;PRO=0;QA=1382;QR=5251;RO=133;RPL=11;RPP=12.2071;RPPR=14.9126;RPR=23;RUN=1;SAF=20;SAP=5.30951;SAR=14;SRF=69;SRP=3.41847;SRR=64;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:58:58,30:28:1126:30:1209:-65.473,0,-73.6021
cneoH99_Chr1 10482 . T C 277.38 . AB=0.221519;ABP=109.44;AC=1;AF=0.5;AN=2;AO=35;CIGAR=1X;DP=158;DPB=158;DPRA=0;EPP=20.9405;EPPR=6.98251;GTI=0;LEN=1;MEANALT=1;MQM=32.7143;MQMR=54.6585;NS=1;NUMALT=1;ODDS=63.8691;PAIRED=1;PAIREDR=0.97561;PAO=0;PQA=0;PQR=0;PRO=0;QA=1431;QR=4853;RO=123;RPL=15;RPP=4.56135;RPPR=15.8802;RPR=20;RUN=1;SAF=20;SAP=4.56135;SAR=15;SRF=65;SRP=3.87536;SRR=58;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:55:55,31:24:942:31:1257:-66.719,0,-57.9497
cneoH99_Chr1 10536 . C T 439.911 . AB=0.309091;ABP=37.8328;AC=1;AF=0.5;AN=2;AO=34;CIGAR=1X;DP=110;DPB=110;DPRA=0;EPP=7.09778;EPPR=22.325;GTI=0;LEN=1;MEANALT=1;MQM=32.6471;MQMR=51.3816;NS=1;NUMALT=1;ODDS=101.293;PAIRED=1;PAIREDR=0.986842;PAO=0;PQA=0;PQR=0;PRO=0;QA=1390;QR=3050;RO=76;RPL=30;RPP=46.1843;RPPR=36.0395;RPR=4;RUN=1;SAF=17;SAP=3.0103;SAR=17;SRF=30;SRP=10.3247;SRR=46;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:41:41,28:13:533:28:1139:-64.4541,0,-25.4782
cneoH99_Chr1 10545 . C T 360.202 . AB=0.28;ABP=45.05;AC=1;AF=0.5;AN=2;AO=28;CIGAR=1X;DP=100;DPB=100;DPRA=0;EPP=5.80219;EPPR=17.6074;GTI=0;LEN=1;MEANALT=1;MQM=34.3214;MQMR=50.0694;NS=1;NUMALT=1;ODDS=82.9395;PAIRED=1;PAIREDR=0.986111;PAO=0;PQA=0;PQR=0;PRO=0;QA=1139;QR=2882;RO=72;RPL=22;RPP=22.8638;RPPR=26.6552;RPR=6;RUN=1;SAF=11;SAP=5.80219;SAR=17;SRF=25;SRP=17.6074;SRR=47;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:38:38,24:14:565:24:976:-55.8434,0,-29.2033
cneoH99_Chr1 12362 . T A 0.00134313 . AB=0.275862;ABP=15.6647;AC=1;AF=0.5;AN=2;AO=8;CIGAR=1X;DP=29;DPB=29;DPRA=0;EPP=3.0103;EPPR=20.4855;GTI=0;LEN=1;MEANALT=1;MQM=18.25;MQMR=31;NS=1;NUMALT=1;ODDS=8.08115;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=319;QR=774;RO=21;RPL=4;RPP=3.0103;RPPR=3.94093;RPR=4;RUN=1;SAF=8;SAP=20.3821;SAR=0;SRF=10;SRP=3.1137;SRR=11;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 0/1:29:29,8:21:774:8:319:-4.12212,0,-46.0364
cneoH99_Chr1 12362 . TCCTCAT ATGTCAC 3.47052e-06 . AB=0;ABP=0;AC=0;AF=0;AN=2;AO=3;CIGAR=3X3M1X;DP=13;DPB=13.8571;DPRA=0;EPP=9.52472;EPPR=22.5536;GTI=0;LEN=7;MEANALT=2;MQM=14.6667;MQMR=29.3333;NS=1;NUMALT=1;ODDS=14.0398;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=36;PRO=1;QA=94;QR=365;RO=9;RPL=0;RPP=9.52472;RPPR=3.25157;RPR=3;RUN=1;SAF=3;SAP=9.52472;SAR=0;SRF=4;SRP=3.25157;SRR=5;TYPE=complex GT:DP:DPR:RO:QR:AO:QA:GL 0/0:13:13,3:9:365:3:94:0,-0.118182,-21.5253
cneoH99_Chr1 12386 . CGA TGT 3.21407e-12 . AB=0;ABP=0;AC=0;AF=0;AN=2;AO=8;CIGAR=1X1M1X;DP=36;DPB=36.3333;DPRA=0;EPP=4.09604;EPPR=12.7417;GTI=0;LEN=3;MEANALT=2;MQM=10.875;MQMR=32.5185;NS=1;NUMALT=1;ODDS=27.9352;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=32;PRO=1;QA=308;QR=1047;RO=27;RPL=5;RPP=4.09604;RPPR=6.95112;RPR=3;RUN=1;SAF=8;SAP=20.3821;SAR=0;SRF=17;SRP=6.95112;SRR=10;TYPE=complex GT:DP:DPR:RO:QR:AO:QA:GL 0/1:17:17,7:7:287:7:277:-6.75383,0,-14.0539
cneoH99_Chr1 12441 . A G
tw164 commented 6 years ago

Is this your pooled sample? It looks fine for preparing Multipool input, as it has the RO/AO tag set used by FreeBayes to indicate the number of reference and alternate allele observations.

However, there is one missing piece: genotype data for the founders/parents — required for assigning alleles to a specific parent. Given data for two parents of the F1 cross (say, PARENT1 and PARENT2) that are stored in a VCF file (say, parents.vcf), the command to prepare these for Multipool might be as follows:

python2 mp_prep.py -p unknown -f PARENT1,PARENT2 -r cneoH99_Chr1 -o allele_counts.txt input.vcf parents.vcf

…where unknown is the name of the pooled sample contained in input.vcf, PARENT1 and PARENT2 are the names of the parent samples contained in parents.vcf, cneoH99_Chr1 is the chromosome of interest, and allele_counts.txt is the output allele counts file.

emmannaemeka commented 6 years ago

Please a clarification

1.Parents.vcf should it be a combination of parent1 and Parent2

  1. input.vcf should it be combination of high and low pool?
  2. unknown is it same as input.vcf

Thanks for your patience

Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/ publications

On Jan 9, 2018 7:51 AM, "TAB Walsh" notifications@github.com wrote:

Is this your pooled sample? It looks fine for preparing Multipool input, as it has the RO/AO tag set used by FreeBayes to indicate the number of reference and alternate allele observations.

However, there is one missing piece: genotype data for the founders/parents — required for assigning alleles to a specific parent. Given data for two parents of the F1 cross (say, PARENT1 and PARENT2) that are stored in a VCF file (say, parents.vcf), the command to prepare these for Multipool might be as follows:

python2 mp_prep.py -p unknown -f PARENT1,PARENT2 -r cneoH99_Chr1 -o allele_counts.txt input.vcf parents.vcf

…where unknown is the name of the pooled sample contained in input.vcf, PARENT1 and PARENT2 are the names of the parent samples contained in parents.vcf, cneoH99_Chr1 is the chromosome of interest, and allele_counts.txt is the output allele counts file.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/matted/multipool/issues/10#issuecomment-356195941, or mute the thread https://github.com/notifications/unsubscribe-auth/Aeq253tl0GUdBhqhNIzF_dk7S543M2F2ks5tIwx7gaJpZM4RWpA2 .

emmannaemeka commented 6 years ago

I ran this command line python2 mp_prep.py -p /Users/emmannaemeka/Desktop/QTLcryptococcus/Pool_combine.txt -f /Users/emmannaemeka/Desktop/QTLcryptococcus/Parent1.txt,/Users/emmannaemeka/Desktop/QTLcryptococcus/Parent2.txt -r cneoH99_Chr1 -o allele_counts.txt /Users/emmannaemeka/Desktop/QTLcryptococcus/Pool_combine.txt /Users/emmannaemeka/Desktop/QTLcryptococcus/Parents.vcf

This was the result

Python version: 2.7.13 Multipool version: 0.10.2 PySAM version: 0.13 Input variant file(s): /Users/emmannaemeka/Desktop/QTLcryptococcus/Parents.vcf:/Users/emmannaemeka/Desktop/QTLcryptococcus/Pool_combine.txt Pool sample: /Users/emmannaemeka/Desktop/QTLcryptococcus/Pool_combine.txt Founder sample(s): /Users/emmannaemeka/Desktop/QTLcryptococcus/Parent1.txt,/Users/emmannaemeka/Desktop/QTLcryptococcus/Parent2.txt Region(s): cneoH99_Chr1 Output allele depth file: allele_counts.txt Checking for pool and founder samples in input variant data... Sample '/Users/emmannaemeka/Desktop/QTLcryptococcus/Parent1.txt' found in input file(s): Sample '/Users/emmannaemeka/Desktop/QTLcryptococcus/Parent2.txt' found in input file(s): Sample '/Users/emmannaemeka/Desktop/QTLcryptococcus/Pool_combine.txt' found in input file(s): Getting founder allele info... Found 0 variants in region 'cneoH99_Chr1' of founders '/Users/emmannaemeka/Desktop/QTLcryptococcus/Parent1.txt' and '/Users/emmannaemeka/Desktop/QTLcryptococcus/Parent2.txt'... Of these, 0 are SNPs... Of these, 0 have homozygous genotypes... Of these, 0 are segregating with respect to founders '/Users/emmannaemeka/Desktop/QTLcryptococcus/Parent1.txt' and '/Users/emmannaemeka/Desktop/QTLcryptococcus/Parent2.txt'... Traceback (most recent call last): File "mp_prep.py", line 490, in sample_file_info, quiet=quiet) File "mp_prep.py", line 361, in get_founder_allele_info "for region '{}'".format(region)) RuntimeError: founder alleles not found for region 'cneoH99_Chr1'

tw164 commented 6 years ago

In response to your questions:

  1. Actually, mp_prep.py will accept any number of VCF files, and will extract genotype data only for the specified pool sample and its two parent samples. So you could pass one VCF file containing the pool and parent samples, one VCF containing the pooled sample and one containing both parent samples, or even a separate VCF file for each sample. All of these would be valid input.
  2. The VCF file input to mp_prep.py can contain data for both high and low samples, but you would need to set parameter -p (the pool sample name) of either the high pool or the low pool. Each call to mp_prep.py script produces a single allele count file containing data for a single pool, which can then be input to the main Multipool script (mp_inference.py). If you have both a high and low pool, you will need to run mp_prep.py once for each pool, then input both allele count files simultaneously to mp_inference.py.
  3. The text string in the rightmost column of the top row of your VCF excerpt, above the sample genotype data, is the sample name. In this example it is unknown, but it could just as easily be High_Pool, for example.

In response to the error, parameters -p and -f should be the names of the pool sample and founder samples, respectively. Please have a look at the help output for this script, which can be viewed with the following command:

python2 mp_prep.py -h

Before running this prep script, you will need to check the VCF data to find the names of the pool samples and the founder samples. These must then be specified when you run the script, so that it knows which samples are which.

emmannaemeka commented 6 years ago

Hi Matt

I think I have a situation at hand for example this is the VCF for one of the pools my low bulk but i can't find the name of the pool on it(sample name) this is the same problem for all of them. So when i run this command line

python2 mp_prep.py -p lowbulk,highbulk -f KN99a,EN28 -r cneoH99_Chr11 -o allele_counts.txt /Users/emmannaemeka/Desktop/QTLcryptococcus/Pool_combine.vcf /Users/emmannaemeka/Desktop/QTLcryptococcus/Parents_combined.vcf

It returns negative. the sample name(lowbulk,highbulk KN99a,EN28) can't be found in the VCF file. how can i solve this

find an exampled of the vcd file for the lowbulk

fileformat=VCFv4.1

fileDate=20180108

source=freeBayes v1.0.2-29-g41c1313

reference=localref.fa

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

phasing=none

commandline="freebayes --region cneoH99_Chr9:0..1186808 --bam b_0.bam

--fasta-reference localref.fa --vcf ./vcf_output/part_cneoH99_Chr9:0..1186808.vcf"

filter="DP > 10"

INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with

data">

INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth at the

locus">

INFO=<ID=DPB,Number=1,Type=Float,Description="Total read depth per bp at

the locus; bases in reads overlapping / bases in haplotype">

INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate

alleles in called genotypes">

INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in

called genotypes">

INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency

in the range (0,1]">

INFO=<ID=RO,Number=1,Type=Integer,Description="Reference allele

observation count, with partial observations recorded fractionally">

INFO=<ID=AO,Number=A,Type=Integer,Description="Alternate allele

observations, with partial observations recorded fractionally">

INFO=<ID=PRO,Number=1,Type=Float,Description="Reference allele

observation count, with partial observations recorded fractionally">

INFO=<ID=PAO,Number=A,Type=Float,Description="Alternate allele

observations, with partial observations recorded fractionally">

INFO=<ID=QR,Number=1,Type=Integer,Description="Reference allele quality

sum in phred">

INFO=<ID=QA,Number=A,Type=Integer,Description="Alternate allele quality

sum in phred">

INFO=<ID=PQR,Number=1,Type=Float,Description="Reference allele quality

sum in phred for partial observations">

INFO=<ID=PQA,Number=A,Type=Float,Description="Alternate allele quality

sum in phred for partial observations">

INFO=<ID=SRF,Number=1,Type=Integer,Description="Number of reference

observations on the forward strand">

INFO=<ID=SRR,Number=1,Type=Integer,Description="Number of reference

observations on the reverse strand">

INFO=<ID=SAF,Number=A,Type=Integer,Description="Number of alternate

observations on the forward strand">

INFO=<ID=SAR,Number=A,Type=Integer,Description="Number of alternate

observations on the reverse strand">

INFO=<ID=SRP,Number=1,Type=Float,Description="Strand balance probability

for the reference allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SRF and SRR given E(SRF/SRR) ~ 0.5, derived using Hoeffding's inequality">

INFO=<ID=SAP,Number=A,Type=Float,Description="Strand balance probability

for the alternate allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SAF and SAR given E(SAF/SAR) ~ 0.5, derived using Hoeffding's inequality">

INFO=<ID=AB,Number=A,Type=Float,Description="Allele balance at

heterozygous sites: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous">

INFO=<ID=ABP,Number=A,Type=Float,Description="Allele balance probability

at heterozygous sites: Phred-scaled upper-bounds estimate of the probability of observing the deviation between ABR and ABA given E(ABR/ABA) ~ 0.5, derived using Hoeffding's inequality">

INFO=<ID=RUN,Number=A,Type=Integer,Description="Run length: the number of

consecutive repeats of the alternate allele in the reference genome">

INFO=<ID=RPP,Number=A,Type=Float,Description="Read Placement Probability:

Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality">

INFO=<ID=RPPR,Number=1,Type=Float,Description="Read Placement Probability

for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality">

INFO=<ID=RPL,Number=A,Type=Float,Description="Reads Placed Left: number

of reads supporting the alternate balanced to the left (5') of the alternate allele">

INFO=<ID=RPR,Number=A,Type=Float,Description="Reads Placed Right: number

of reads supporting the alternate balanced to the right (3') of the alternate allele">

INFO=<ID=EPP,Number=A,Type=Float,Description="End Placement Probability:

Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality">

INFO=<ID=EPPR,Number=1,Type=Float,Description="End Placement Probability

for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality">

INFO=<ID=DPRA,Number=A,Type=Float,Description="Alternate allele depth

ratio. Ratio between depth in samples with each called alternate allele and those without.">

INFO=<ID=ODDS,Number=1,Type=Float,Description="The log odds ratio of the

best genotype combination to the second-best.">

INFO=<ID=GTI,Number=1,Type=Integer,Description="Number of genotyping

iterations required to reach convergence or bailout.">

INFO=<ID=TYPE,Number=A,Type=String,Description="The type of allele,

either snp, mnp, ins, del, or complex.">

INFO=<ID=CIGAR,Number=A,Type=String,Description="The extended CIGAR

representation of each alternate allele, with the exception that '=' is replaced by 'M' to ease VCF parsing. Note that INDEL alleles do not have the first matched base (which is provided by default, per the spec) referred to by the CIGAR.">

INFO=<ID=NUMALT,Number=1,Type=Integer,Description="Number of unique

non-reference alleles in called genotypes at this position.">

INFO=<ID=MEANALT,Number=A,Type=Float,Description="Mean number of unique

non-reference allele observations per sample with the corresponding alternate alleles.">

INFO=

INFO=<ID=MQM,Number=A,Type=Float,Description="Mean mapping quality of

observed alternate alleles">

INFO=<ID=MQMR,Number=1,Type=Float,Description="Mean mapping quality of

observed reference alleles">

INFO=<ID=PAIRED,Number=A,Type=Float,Description="Proportion of observed

alternate alleles which are supported by properly paired read fragments">

INFO=<ID=PAIREDR,Number=1,Type=Float,Description="Proportion of observed

reference alleles which are supported by properly paired read fragments">

INFO=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum depth in gVCF

output block.">

INFO=<ID=END,Number=1,Type=Integer,Description="Last position (inclusive)

in gVCF output record.">

FORMAT=

FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality, the

Phred-scaled marginal (or unconditional) probability of the called genotype">

FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood,

log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy">

FORMAT=

FORMAT=<ID=DPR,Number=A,Type=Integer,Description="Number of observation

for each allele">

FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele

observation count">

FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the

reference observations">

FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele

observation count">

FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the

alternate observations">

FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum depth in

gVCF output block.">

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT unknown

cneoH99_Chr1 99 . T C 50426.3 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1415;CIGAR=1X;DP=1415;DPB=1415;DPRA=0;EPP=6.6949;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=59.983;MQMR=0;NS=1;NUMALT=1;ODDS=1966.21;PAIRED=0.928622;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=56243;QR=0;RO=0;RPL=619;RPP=51.0881;RPPR=0;RPR=796;RUN=1;SAF=781;SAP=36.1717;SAR=634;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1415:1415,1415:0:0:1415:56243:-5056.15,-425.957,0

cneoH99_Chr1 161 . C G 52460.3 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=1487;CIGAR=1X;DP=1487;DPB=1487;DPRA=0;EPP=21.0027;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=2066.02;PAIRED=0.958978;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=58459;QR=0;RO=0;RPL=795;RPP=18.5027;RPPR=0;RPR=692;RUN=1;SAF=787;SAP=14.0633;SAR=700;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:DPR:RO:QR:AO:QA:GL 1/1:1487:1487,1487:0:0:1487:58459:-5255.54,-447.632,0

cneoH99_Chr1 168 . T C 53275.2 .

Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications

On Tue, Jan 9, 2018 at 8:20 PM, TAB Walsh notifications@github.com wrote:

In response to your questions:

  1. Actually, mp_prep.py will accept any number of VCF files, and will extract genotype data only for the specified pool sample and its two parent samples. So you could pass one VCF file containing the pool and parent samples, one VCF containing the pooled sample and one containing both parent samples, or even a separate VCF file for each sample. All of these would be valid input.
  2. The VCF file input to mp_prep.py can contain data for both high and low samples, but you would need to set parameter -p (the pool sample name) of either the high pool or the low pool. Each call to mp_prep.py script produces a single allele count file containing data for a single pool, which can then be input to the main Multipool script ( mp_inference.py). If you have both a high and low pool, you will need to run mp_prep.py once for each pool, then input both allele count files simultaneously to mp_inference.py.
  3. The text string in the rightmost column of the top row of your VCF excerpt, above the sample genotype data, is the sample name. In this example it is unknown, but it could just as easily be High_Pool, for example.

In response to the error, parameters -p and -f should be the names of the pool sample and founder samples, respectively. Please have a look at the help output for this script, which can be viewed with the following command:

python2 mp_prep.py -h

Before running this prep script, you will need to check the VCF data to find the names of the pool samples and the founder samples. These must then be specified when you run the script, so that it knows which samples are which.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/matted/multipool/issues/10#issuecomment-356386267, or mute the thread https://github.com/notifications/unsubscribe-auth/Aeq25-GNs91aVRUzz9jCqOgdDMoRvkGqks5tI7v6gaJpZM4RWpA2 .

emmannaemeka commented 6 years ago

@tw164 Please how can i resolve this Input variant file(s): /Users/emmannaemeka/Desktop/QTLT/Parents.vcf:/Users/emmannaemeka/Desktop/QTLT/Pool.vcf Pool sample: Trimmomatic_on_Low_bulk_R Founder sample(s): Trimmomatic_on_KN99a_R,Trimmomatic_on_EN28 Region(s): cneoH99_Chr1 Output allele depth file: allele_counts.txt Checking for pool and founder samples in input variant data... Sample 'Trimmomatic_on_KN99a_R' found in input file(s): /Users/emmannaemeka/Desktop/QTLT/Pool.vcf Sample 'Trimmomatic_on_EN28' found in input file(s): /Users/emmannaemeka/Desktop/QTLT/Pool.vcf Sample 'Trimmomatic_on_Low_bulk_R' found in input file(s): /Users/emmannaemeka/Desktop/QTLT/Parents.vcf Getting founder allele info... [W::vcf_parse] Contig 'cneoH99_Chr1' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr9' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr10' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr11' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr12' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr13' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr2' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr3' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr4' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr5' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr6' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr7' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr8' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr1' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr9' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr10' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr11' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr12' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr13' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr2' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr3' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr4' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr5' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr6' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr7' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] Contig 'cneoH99_Chr8' is not defined in the header. (Quick workaround: index the file with tabix.) Found 9 variants in region 'cneoH99_Chr1' of founders 'Trimmomatic_on_KN99a_R' and 'Trimmomatic_on_EN28'... Of these, 8 are SNPs... Of these, 0 have homozygous genotypes... Of these, 0 are segregating with respect to founders 'Trimmomatic_on_KN99a_R' and 'Trimmomatic_on_EN28'... Traceback (most recent call last): File "mp_prep.py", line 490, in sample_file_info, quiet=quiet) File "mp_prep.py", line 361, in get_founder_allele_info "for region '{}'".format(region)) RuntimeError: founder alleles not found for region 'cneoH99_Chr1'

tw164 commented 6 years ago

@emmannaemeka All sample names are in the header line of the VCF file. (See here for an introduction to the VCF format, and here for the VCF format specification.)

You can view the header line with the following bash command:

grep "^#[^#]" file.vcf

If you have bcftools installed, you can view a list of samples as follows:

bcftools query --list-samples file.vcf

Without knowing more about your data, I can only guess at the reason, but the RuntimeError appears to be due to a lack of SNP data for the reference sequence "cneoH99_Chr1".

emmannaemeka commented 6 years ago

I have found away around the problem. It worked. I have attached the plot of one of the chromosomes. Can you assist in interpreting it? chr5 1

tw164 commented 6 years ago

@emmannaemeka This output is a little different from that of the original version of Multipool in that the LOD curve is shown in black, and individual data points are not displayed. However, in most respects the output is broadly similar to the original Multipool, and so the Multipool wiki provides the best general guidance to interpreting Multipool output.

For more specific assistance, you would need to provide more information about the input data and parameters used. For example, what species are the samples taken from? What is the phenotype being tested? What reference genome (or genomes) did you use when calling SNPs? How many SNPs did you find in each sample, particularly on the chromosome shown in the plot above? What were the commands used for running mp_prep.py on each pool? And what parameter settings did you use when running mp_inference.py?

emmannaemeka commented 6 years ago

Thank you for the explanation.

Should I provide the information here or send it as a private email? My email is eennadi@gmail.com

Thanks for your assistance

Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications

On Jan 14, 2018 4:19 PM, "TAB Walsh" notifications@github.com wrote:

@emmannaemeka https://github.com/emmannaemeka This output is a little different from that of the original version of Multipool in that the LOD curve is shown in black, and individual data points are not displayed. However, in most respects the output is broadly similar to the original Multipool, and so the Multipool wiki https://github.com/matted/multipool/wiki provides the best general guidance to interpreting Multipool output.

For more specific assistance, you would need to provide more information about the input data and parameters used. For example, what species are the samples taken from? What is the phenotype being tested? What reference genome (or genomes) did you use when calling SNPs? How many SNPs did you find in each sample, particularly on the chromosome shown in the plot above? What were the commands used for running mp_prep.py on each pool? And what parameter settings did you use when running mp_inference.py?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/matted/multipool/issues/10#issuecomment-357518705, or mute the thread https://github.com/notifications/unsubscribe-auth/Aeq25wr-qAnDlKos-aB4P7Eujqc0KHELks5tKhrmgaJpZM4RWpA2 .