vatlab / varianttools

software tool for the manipulation, annotation, selection, and analysis of variants in the context of next-gen sequencing analysis
https://vatlab.github.io/vat-docs/
GNU General Public License v3.0
31 stars 4 forks source link

WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 #149

Closed marionamujal closed 3 years ago

marionamujal commented 4 years ago

Hi, I get the above warning importing a vcf file, i saw the same issue raised previously though i didn't quite see a solution for it. Do i just have to ignore it since the vcf file is imported anyway.

Thank you.

BoPeng commented 4 years ago

Yes, these lines are ignored since variant tools does not understand chromosome 65535. You can grep the vcf file and post the offending lines so that we can see if we should try to handle them.

marionamujal commented 4 years ago

Thank you for the reply, this is how the lines look like; 2020-09-28 15:48:53,163: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,164: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,164: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,164: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,164: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,164: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,165: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,166: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,166: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535 2020-09-28 15:48:53,166: WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535

BoPeng commented 4 years ago

I meant the lines that contains 65535 in your vcf file, something like the output of

grep 65535 input.vcf | head 10
marionamujal commented 4 years ago

The command above doesn't yield any useful information. I don't seem to have that in my vcf file;

1       16962159        rs1765535       C       A       2431.89 PASS    AC=66;AF=0.541;AN=122;BaseQRankSum=0.727;ClippingRankSum=0.736;DB;DP=184;ExcessHet=0;FS=0;GQ_MEAN=6;InbreedingCoeff=0.4377;MLEAC=78;MLEAF=0.639;MQ=44.86;MQRankSum=0.736;NCC=0;QD=29.3;ReadPosRankSum=0.736;SOR=0.846;VQSLOD=2.73;VariantType=SNP;culprit=InbreedingCoeff;set=variant     GT:AB:AD:DP:FT:GQ:PL    ./.:.:1,0:1:PASS:.:.    ./.:.:0,2:2:GQ;LowDP:6:73,6,0   ./.:.:2,0:2:PASS:.:.    ./.:.:2,0:2:GQ;LowDP:6:0,6,49        ./.:.:0,0:0:PASS:.:.    ./.:.:3,0:3:GQ;LowDP:0:0,0,14   ./.:.:0,0:0:PASS:.:.    ./.:.:1,0:1:GQ;LowDP:3:0,3,29   ./.:.:0,0:0:PASS:.:../.:.:0,0:0:PASS:.:.     ./.:.:0,0:0:PASS:.:.    ./.:.:1,0:1:PASS:.:.    ./.:.:0,3:3:GQ;LowDP:9:106,9,0  ./.:.:0,0:0:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:0,0:0:PASS:.:. ./.:.:1,0:1:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:1,0:1:PASS:.:.    ./.:.:0,2:2:GQ;LowDP:6:72,6,0   ./.:.:0,0:0:PASS:.:. ./.:.:0,2:2:GQ;LowDP:6:49,6,0   ./.:.:0,0:0:PASS:.:.    ./.:.:0,2:2:GQ;LowDP:6:71,6,0   ./.:.:0,0:0:PASS:.:.    ./.:.:1,0:1:GQ;LowDP:3:0,3,32   ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:.    ./.:.:1,0:1:PASS:.:.    ./.:.:0,2:2:GQ;LowDP:6:72,6,0   ./.:.:1,0:1:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:.    ./.:.:1,0:1:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:0,2:2:GQ;LowDP:6:72,6,0        ./.:.:0,0:0:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:1,0:1:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:0,0:0:PASS:.:. ./.:.:0,0:0:PASS:.:.    ./.:.:1,0:1:PASS:.:.    ./.:.:0,0:0:PASS:.:.    ./.:.:1,0:1:PASS:.:.    ./.:.:1,0:1:PASS:.:.    ./.:.:0,0:0:PASS:.:..
BoPeng commented 4 years ago

What is the last record in your vcf file? (tail -1 input.vcf).

marionamujal commented 4 years ago

The last 9 records look like this;

GL000192.1      408422  .       A       G       32.56   PASS    AC=2;AF=0.027;AN=74;DP=42;ExcessHet=0.0298;FS=0;GQ_MEAN=6;InbreedingCoeff=-0.0701;MLEAC=1;ML>
GL000192.1      472457  .       C       T       43.94   PASS    AC=4;AF=0.044;AN=90;DP=51;ExcessHet=0.0009;FS=0;GQ_MEAN=6;InbreedingCoeff=-0.0025;MLEAC=4;ML>
GL000192.1      481470  .       A       T       45.94   PASS    AC=2;AF=0.029;AN=70;DP=44;ExcessHet=0.0316;FS=0;GQ_MEAN=9;InbreedingCoeff=-0.0406;MLEAC=2;ML>
GL000192.1      483054  .       G       A       51.62   PASS    AC=2;AF=0.022;AN=90;DP=56;ExcessHet=0.0245;FS=0;GQ_MEAN=6;InbreedingCoeff=-0.0874;MLEAC=1;ML>
GL000192.1      493318  .       C       T       1079.58 PASS    AC=5;AF=0.046;AN=108;BaseQRankSum=-0.198;ClippingRankSum=0.406;DP=1924;ExcessHet=0.0232;FS=8>
GL000192.1      523656  .       T       C       45.86   PASS    AC=4;AF=0.065;AN=62;DP=44;ExcessHet=0.0018;FS=0;GQ_MEAN=6;InbreedingCoeff=0.0858;MLEAC=3;MLE>
GL000192.1      545092  .       C       T       55.7    PASS    AC=5;AF=0.02;AN=250;BaseQRankSum=0;ClippingRankSum=0;DP=199;ExcessHet=0.0005;FS=0;GQ_MEAN=6;>
GL000192.1      545107  .       T       C       835.47  PASS    AC=32;AF=0.155;AN=206;BaseQRankSum=-0.736;ClippingRankSum=0.736;DP=214;ExcessHet=0;FS=1.81;G>
GL000192.1      545110  .       A       T       42.51   PASS    AC=1;AF=0.003846;AN=260;BaseQRankSum=-0.736;ClippingRankSum=-0.736;DP=206;ExcessHet=3.0103;F>
marionamujal commented 4 years ago

And when importing is done, this is the comment;


/opt/conda/lib/python3.7/site-packages/tables/path.py:157: NaturalNameWarning: object name is not a valid Python identifier: 'chrGL000193.1'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  check_attribute_name(name)
/opt/conda/lib/python3.7/site-packages/tables/path.py:157: NaturalNameWarning: object name is not a valid Python identifier: 'chrGL000194.1'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  check_attribute_name(name)
/opt/conda/lib/python3.7/site-packages/tables/path.py:157: NaturalNameWarning: object name is not a valid Python identifier: 'chrGL000225.1'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  check_attribute_name(name)
/opt/conda/lib/python3.7/site-packages/tables/path.py:157: NaturalNameWarning: object name is not a valid Python identifier: 'chrGL000192.1'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  check_attribute_name(name)
BoPeng commented 4 years ago

Yes, all these contigs are not supported by the current reference genome used... mostly because of lack of annotation to them.

marionamujal commented 4 years ago

Thank you for your time, this is very helpful!