Open brankovicmarija opened 5 years ago
Hi brankovicmarija,
Would you please remove the --format vcf
, just try
vtools import ././SCN198.selected.vcf --build hg19 --var_info DP,qual,filter,id --geno_info DP_geno,AD_geno,qual
Since your file is in vcf format which is the default format.
Hi! I tried that but it returns this message: [:10])>0: TypeError: '>' not supported between instances of 'str' and 'int' Also tried this command grep ">0" eff.scn198.vcf and nothing found
Have you tried reinitiate the project from the beginning?
vtools init myproject -f
Then, import vcf again.
@brankovicmarija Could you send us a minimal example to demonstrate the problem, such as the first 1000 lines of your vcf file and exact commands that you have used?
@jma7 I tried but it is the same error :[:10])>0: TypeError: '>' not supported between instances of 'str' and 'int'
@BoPeng Here is the command: vtools import ./SCN198.selected.vcf --build hg19 --var_info DP,qual,filter,id --geno_info DP_geno,AD_geno,qual And here is the vcf: ##fileformat=VCFv4.2
chr10 64863 . A C 37.28 my_snp_filter AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=27.00;QD=18.64;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0 chr10 64863 . A C 37.28 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=27.00;QD=18.64;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0 chr10 68091 . T C 37.28 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=18.64;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0 chr10 68091 . T C 37.28 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=18.64;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0 chr10 74448 . G T 37.28 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=47.00;QD=18.64;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0 chr10 74448 . G T 37.28 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=47.00;QD=18.64;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0 chr10 93603 . C T 143.10 my_snp_filter AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=37.18;QD=25.36;SOR=1.609 GT:AD:DP:GQ:PL 1/1:0,4:4:12:157,12,0 chr10 93603 . C T 143.10 PASS AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=37.18;QD=25.36;SOR=1.609 GT:AD:DP:GQ:PL 1/1:0,4:4:12:157,12,0 chr10 93816 . C T 362.60 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-1.030;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=58.85;MQRankSum=-0.686;QD=17.27;ReadPosRankSum=-0.943;SOR=0.458 GT:AD:DP:GQ:PL 0/1:8,13:21:99:370,0,242 chr10 93816 . C T 362.60 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-1.030;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=58.85;MQRankSum=-0.686;QD=17.27;ReadPosRankSum=-0.943;SOR=0.458 GT:AD:DP:GQ:PL 0/1:8,13:21:99:370,0,242 chr10 93945 . G A 278.98 PASS AC=2;AF=1.00;AN=2;DP=7;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=54.81;QD=28.73;SOR=4.174 GT:AD:DP:GQ:PL 1/1:0,7:7:21:293,21,0 chr10 93945 . G A 278.98 PASS AC=2;AF=1.00;AN=2;DP=7;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=54.81;QD=28.73;SOR=4.174 GT:AD:DP:GQ:PL 1/1:0,7:7:21:293,21,0 chr10 94083 . C T 33.44 PASS AC=2;AF=1.00;AN=2;DP=1;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=40.00;QD=33.44;SOR=1.609 GT:AD:DP:GQ:PL 1/1:0,1:1:3:43,3,0 chr10 94083 . C T 33.44 PASS AC=2;AF=1.00;AN=2;DP=1;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=40.00;QD=33.44;SOR=1.609 GT:AD:DP:GQ:PL 1/1:0,1:1:3:43,3,0 chr10 95226 . A T 67.28 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=33.64;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:79,6,0
Hi,
The --var_info
option in the vtools import
command line was meant to import fields in the INFO
column from the vcf file, currently DP,GQ,GD,HQ,AD,PL,MQ,NS
tags are supported.
So you might want to adjust your command line to include the tags you are interested in, for example
vtools import SCN198.selected.vcf --build hg19 --var_info DP AC AF --geno_info DP_geno,AD_geno
We will improve our error reporting to make this more clear, Thanks!
Thanks for the reply.
We reset the project and tried your command. There were many non-breaking key errors during the execution e.g.:
Traceback (most recent call last):
File "/home/korisnik/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/korisnik/anaconda3/lib/python3.7/site-packages/variant_tools/importer_allele_hdf5.py", line 1132, in run
variant_id = self.variantIndex[tuple((chr, ref, alt))][pos][0]
KeyError: 110874502
Looking at the VCF file there were always locations one position smaller. I also remind you of my first post and the problem with warning "unrecognized chromosome id: 65535".
At the end, we show fields:
variant.chr (char) Chromosome name (VARCHAR)
variant.pos (int) Position (INT, 1-based)
variant.ref (char) Reference allele (VARCHAR, - for missing allele of an
insertion)
variant.alt (char) Alternative allele (VARCHAR, - for missing allele of an
deletion)
variant.DP (int)
variant.AC (int)
variant.AF (float)
variant.num (int) Created from stat "#(alt)" with type INT on Apr19
variant.hom (int) Created from stat "#(hom)" with type INT on Apr19
variant.het (int) Created from stat "#(het)" with type INT on Apr19
variant.other (int) Created from stat "#(other)" with type INT on Apr19
variant.total (int) Created from stat "#(GT)" with type INT on Apr19
Any clue why DP_geno was not generated?
Thanks!
vtools show fields
will only show the fields in the variant table. DP_geno is for each sample which is saved in *.h5 file. I am not sure what is causing the one position smaller problem you mentioned. It seems that there are duplication of variants in your VCF file, but the sample file was imported fine at my end. Would you please send me the portion of the VCF file having the key error problem, maybe plus minus 100 lines around position 110874502
?
Hello! I am using version 3.0.2 trying to add DP to the available fields according to your tutorial with the following command:
vtools import ././SCN198.selected.vcf --build hg19 --var_info DP,qual,filter,id --geno_info DP_geno,AD_geno,qual --format vcf
First I get an unusual warning:
WARNING: hg19: Failed to get reference sequence: unrecognized chromosome id: 65535
There is no such sequence in my file: $grep 65535 SCN198.selected.vcfchr12 82265535 . A G 35.44 PASS AC=2;AF=1.00;AN=2;DP=1;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=32.33;SOR=1.609GT:AD:DP:GQ:PL 1/1:0,1:1:3:45,3,0
Then, I get an error:
Even though the process fails, the sample is still added to the list:
When I try to remove the samples to try importing again, it fails:
Is there a systematic error here I am not seeing? How can I remove the sample and add DP to my fields?