Closed marionamujal closed 3 years ago
Yes, these lines are ignored since variant tools does not understand chromosome 65535. You can grep the vcf file and post the offending lines so that we can see if we should try to handle them.
Thank you for the reply, this is how the lines look like; 2020-09-28 15:48:53,163: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,164: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,164: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,164: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,164: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,164: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,166: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,166: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,166: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535
I meant the lines that contains 65535
in your vcf file, something like the output of
grep 65535 input.vcf | head 10
The command above doesn't yield any useful information. I don't seem to have that in my vcf file;
1 16962159 rs1765535 C A 2431.89 PASS AC=66;AF=0.541;AN=122;BaseQRankSum=0.727;ClippingRankSum=0.736;DB;DP=184;ExcessHet=0;FS=0;GQ_MEAN=6;InbreedingCoeff=0.4377;MLEAC=78;MLEAF=0.639;MQ=44.86;MQRankSum=0.736;NCC=0;QD=29.3;ReadPosRankSum=0.736;SOR=0.846;VQSLOD=2.73;VariantType=SNP;culprit=InbreedingCoeff;set=variant GT:AB:AD:DP:FT:GQ:PL ./.:.:1,0:1:PASS:.:. ./.:.:0,2:2:GQ;LowDP:6:73,6,0 ./.:.:2,0:2:PASS:.:. ./.:.:2,0:2:GQ;LowDP:6:0,6,49 ./.:.:0,0:0:PASS:.:. ./.:.:3,0:3:GQ;LowDP:0:0,0,14 ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:GQ;LowDP:3:0,3,29 ./.:.:0,0:0:PASS:.:../.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:PASS:.:. ./.:.:0,3:3:GQ;LowDP:9:106,9,0 ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:PASS:.:. ./.:.:0,2:2:GQ;LowDP:6:72,6,0 ./.:.:0,0:0:PASS:.:. ./.:.:0,2:2:GQ;LowDP:6:49,6,0 ./.:.:0,0:0:PASS:.:. ./.:.:0,2:2:GQ;LowDP:6:71,6,0 ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:GQ;LowDP:3:0,3,32 ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:PASS:.:. ./.:.:0,2:2:GQ;LowDP:6:72,6,0 ./.:.:1,0:1:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,2:2:GQ;LowDP:6:72,6,0 ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:PASS:.:. ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:PASS:.:. ./.:.:1,0:1:PASS:.:. ./.:.:0,0:0:PASS:.:..
What is the last record in your vcf file? (tail -1 input.vcf
).
The last 9 records look like this;
GL000192.1 408422 . A G 32.56 PASS AC=2;AF=0.027;AN=74;DP=42;ExcessHet=0.0298;FS=0;GQ_MEAN=6;InbreedingCoeff=-0.0701;MLEAC=1;ML>
GL000192.1 472457 . C T 43.94 PASS AC=4;AF=0.044;AN=90;DP=51;ExcessHet=0.0009;FS=0;GQ_MEAN=6;InbreedingCoeff=-0.0025;MLEAC=4;ML>
GL000192.1 481470 . A T 45.94 PASS AC=2;AF=0.029;AN=70;DP=44;ExcessHet=0.0316;FS=0;GQ_MEAN=9;InbreedingCoeff=-0.0406;MLEAC=2;ML>
GL000192.1 483054 . G A 51.62 PASS AC=2;AF=0.022;AN=90;DP=56;ExcessHet=0.0245;FS=0;GQ_MEAN=6;InbreedingCoeff=-0.0874;MLEAC=1;ML>
GL000192.1 493318 . C T 1079.58 PASS AC=5;AF=0.046;AN=108;BaseQRankSum=-0.198;ClippingRankSum=0.406;DP=1924;ExcessHet=0.0232;FS=8>
GL000192.1 523656 . T C 45.86 PASS AC=4;AF=0.065;AN=62;DP=44;ExcessHet=0.0018;FS=0;GQ_MEAN=6;InbreedingCoeff=0.0858;MLEAC=3;MLE>
GL000192.1 545092 . C T 55.7 PASS AC=5;AF=0.02;AN=250;BaseQRankSum=0;ClippingRankSum=0;DP=199;ExcessHet=0.0005;FS=0;GQ_MEAN=6;>
GL000192.1 545107 . T C 835.47 PASS AC=32;AF=0.155;AN=206;BaseQRankSum=-0.736;ClippingRankSum=0.736;DP=214;ExcessHet=0;FS=1.81;G>
GL000192.1 545110 . A T 42.51 PASS AC=1;AF=0.003846;AN=260;BaseQRankSum=-0.736;ClippingRankSum=-0.736;DP=206;ExcessHet=3.0103;F>
And when importing is done, this is the comment;
/opt/conda/lib/python3.7/site-packages/tables/path.py:157: NaturalNameWarning: object name is not a valid Python identifier: 'chrGL000193.1'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
check_attribute_name(name)
/opt/conda/lib/python3.7/site-packages/tables/path.py:157: NaturalNameWarning: object name is not a valid Python identifier: 'chrGL000194.1'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
check_attribute_name(name)
/opt/conda/lib/python3.7/site-packages/tables/path.py:157: NaturalNameWarning: object name is not a valid Python identifier: 'chrGL000225.1'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
check_attribute_name(name)
/opt/conda/lib/python3.7/site-packages/tables/path.py:157: NaturalNameWarning: object name is not a valid Python identifier: 'chrGL000192.1'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
check_attribute_name(name)
Yes, all these contigs are not supported by the current reference genome used... mostly because of lack of annotation to them.
Thank you for your time, this is very helpful!
Hi, I get the above warning importing a vcf file, i saw the same issue raised previously though i didn't quite see a solution for it. Do i just have to ignore it since the vcf file is imported anyway.
Thank you.