Open dhwani2410 opened 5 years ago
please, check your VCF encoding, it should be encoded as UTF-8.
when i use the example data then also i get the same error
./bin/HAHap phase data/HG002.hs37d5.2x250.bam HG002_heter.vcf.gz out_sample.txt
=== Start HAHap phasing === Parameters: Minimum mapping quality = 0 Parameters: Threshold of low coverage = Median Parameters: Minimum junction number = 4 Parameters: Likelihood of P1 and P2 = 0.49
=== Read Heterozygous Data ===
Traceback (most recent call last):
File "./bin/HAHap", line 9, in
Hi, please unzip the vcf.gz(input is the text file). I uploaded the gz file only for storage issue.
@ifishlin it worked with sample file after I unzipped the VCF. I also checked my VCF file that it is UTF8 encoded. I have uploaded the vcf file in a tab-delimited file as VCF extension was not supported here
Can you please have a look at this file and let me know what could have been a possible source of error?
remove the " in the file, ex (1). "1/1:7,911:918:99:26237,2450,0" => 1/1:7,911:918:99:26237,2450,0 (2). "##FILTER=<ID=LowQual,Description=""Low quality"">" => ##FILTER=<ID=LowQual,Description=""Low quality"">
The comma appeared in file may be because of the conversion of vcf to txt file. I am sending first few lines of VCF file for exact details
6 31321429 rs2596499 T A 26223.03 . AC=2;AF=1.00;AN=2;BaseQRankSum=1.442;DB;DP=918;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.99;MQRankSum=-0.075;QD=28.57;ReadPosRankSum=-0.221;SOR=1.958 GT:AD:DP:GQ:PL 1/1:7,911:918:99:26237,2450,0 6 31321524 rs2844584 G A 35485.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=10.780;DB;DP=2609;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.89;MQRankSum=0.093;QD=13.69;ReadPosRankSum=1.542;SOR=0.704 GT:AD:DP:GQ:PL 0/1:1122,1471:2593:99:35493,0,22929 6 31321578 rs7762909 A G 103935.03 . AC=2;AF=1.00;AN=2;BaseQRankSum=-3.195;DB;DP=3989;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=26.16;ReadPosRankSum=-0.267;SOR=0.958 GT:AD:DP:GQ:PL 1/1:10,3963:3973:99:103949,11666,0 6 31321807 rs2770 G A 276230.03 . AC=2;AF=1.00;AN=2;BaseQRankSum=0.079;DB;DP=6786;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.98;MQRankSum=-5.602;QD=31.11;ReadPosRankSum=0.679;SOR=0.637 GT:AD:DP:GQ:PL 1/1:3,6775:6778:99:276244,20337,0 6 31321856 rs2768 A G 175572.03 . AC=2;AF=1.00;AN=2;BaseQRankSum=-0.992;DB;DP=6327;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.96;MQRankSum=1.265;QD=27.80;ReadPosRankSum=3.991;SOR=0.986 GT:AD:DP:GQ:PL 1/1:13,6302:6315:99:175586,18623,0 6 31321882 rs2769 G A 86240.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=19.850;DB;DP=6343;ExcessHet=3.0103;FS=0.530;MLEAC=1;MLEAF=0.500;MQ=59.88;MQRankSum=-5.425;QD=13.63;ReadPosRankSum=4.167;SOR=0.644 GT:AD:DP:GQ:PL 0/1:2811,3517:6328:99:86248,0,56712 6 31321906 rs1093 A G 71375.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-22.761;DB;DP=6061;ExcessHet=3.0103;FS=1.767;MLEAC=1;MLEAF=0.500;MQ=59.76;MQRankSum=-1.855;QD=11.79;ReadPosRankSum=2.937;SOR=0.836 GT:AD:DP:GQ:PL 0/1:2702,3350:6052:99:71383,0,87556 6 31321915 rs1055890 A G 208398.05 . AC=2;AF=1.00;AN=2;BaseQRankSum=-0.680;DB;DP=5817;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.74;MQRankSum=0.514;QD=30.50;ReadPosRankSum=2.560;SOR=0.439 GT:AD:DP:GQ:PL 1/1:9,5808:5817:99:263663,17466,0 6 31321916 rs1055849 A G 205947.03 . AC=2;AF=1.00;AN=2;BaseQRankSum=-1.288;DB;DP=5824;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.72;MQRankSum=1.344;QD=27.63;ReadPosRankSum=2.409;SOR=0.499 GT:AD:DP:GQ:PL 1/1:11,5807:5818:99:205961,17142,0 6 31321925 rs140769830 T TG 50810.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.994;DB;DP=5672;ExcessHet=3.0103;FS=0.528;MLEAC=1;MLEAF=0.500;MQ=59.68;MQRankSum=0.658;QD=9.02;ReadPosRankSum=-0.256;SOR=0.617 GT:AD:DP:GQ:PL 0/1:3170,2460:5630:99:50818,0,78559 6 31322121 rs2428496 C T 227959.03 . AC=2;AF=1.00;AN=2;BaseQRankSum=1.090;DB;DP=5464;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQRankSum=0.534;QD=32.30;ReadPosRankSum=-0.656;SOR=1.833 GT:AD:DP:GQ:PL 1/1:2,5456:5458:99:227973,16424,0 6 31322129 rs17192932 C G 49114.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.907;DB;DP=5363;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=9.23;ReadPosRankSum=-2.843;SOR=0.671 GT:AD:DP:GQ:PL 0/1:2919,2402:5321:99:49122,0,64970 6 31322175 rs2428495 C T 206930.03 . AC=2;AF=1.00;AN=2;BaseQRankSum=1.195;DB;DP=4582;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.99;MQRankSum=-5.558;QD=26.57;ReadPosRankSum=1.369;SOR=2.667 GT:AD:DP:GQ:PL 1/1:2,4579:4581:99:206944,13925,0 6 31322197 rs2428494 T A 179442.03 . AC=2;AF=1.00;AN=2;BaseQRankSum=1.948;DB;DP=4191;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.99;MQRankSum=0.797;QD=27.12;ReadPosRankSum=0.940;SOR=1.228 GT:AD:DP:GQ:PL 1/1:2,4183:4185:99:179456,12551,0 6 31322220 . C T 163738.06 . AC=2;AF=1.00;AN=2;BaseQRankSum=1.640;DP=3744;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.99;MQRankSum=-0.044;QD=29.14;ReadPosRankSum=2.033;SOR=2.737 GT:AD:DP:GQ:PL 1/1:3,3649:3652:99:163752,11718,0 6 31322367 rs3819299 T G 32668.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-18.180;DB;DP=4089;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=1.866;QD=8.01;ReadPosRankSum=0.049;SOR=0.670 GT:AD:DP:GQ:PL 0/1:2370,1708:4078:99:32676,0,57498 6 31322395 rs17199328 A G 51664.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-16.507;DB;DP=3895;ExcessHet=3.0103;FS=1.834;MLEAC=1;MLEAF=0.500;MQ=59.86;MQRankSum=-2.171;QD=13.31;ReadPosRankSum=-0.806;SOR=0.868 GT:AD:DP:GQ:PL 0/1:1583,2298:3881:99:51672,0,34154
=== Start HAHap phasing === Parameters: Minimum mapping quality = 0 Parameters: Threshold of low coverage = Median Parameters: Minimum junction number = 4 Parameters: Likelihood of P1 and P2 = 0.49
=== Read Heterozygous Data === Traceback (most recent call last): File "./bin/HAHap", line 9, in
main()
File "/home/dhwani/Documents/softwares/HAHap/HAHap/main.py", line 73, in main
module.main(args)
File "/home/dhwani/Documents/softwares/HAHap/HAHap/phase.py", line 56, in main
var_chrom_dict = split_vcf_by_chrom(args.variant_file)
File "/home/dhwani/Documents/softwares/HAHap/HAHap/vcf.py", line 42, in split_vcf_by_chrom
for line in variants_vcf:
File "/home/dhwani/miniconda3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte