VCF format specs allow variants to only list a subset of the INFO fields appearing in the header. In my VCF files, SNPs are optionally assigned RSID= in the last INFO field.
##INFO=<ID=RSID,Number=0,Type=String,Description="rs id number, extracted from">
Most variants list an RSID (first line), but others do not (second line):
chr10 10654 chr10:10654:TGCAGAGAAGAACGCA:T TGCAGAGAAGAACGCA T . PASS AF=0.00035;MAF=0.00035;R2=0.3532;IMPUTED;R2_HAT=0.399014;RSID=rs1211140797
chr10 10709 chr10:10709:G:A G A . PASS AF=0.01458;MAF=0.01458;R2=0.3914;IMPUTED;R2_HAT=0.37429
When I attempt to call seqVCF2GDS one these data, I get the following error:
file format: VCFv4.1
genome reference: <unknown>
the number of sets of chromosomes (ploidy): 2
the number of samples: 2,013
genotype storage: bit2
compression method: LZ4_RA
# of samples: 2013
scenario: imputation
annotation/format/DS: packedreal16
annotation/format/GP: packedreal16
ID Number Type
9 RSID 0 String
9 rs id number, extracted from
Source Version
9 <NA> <NA>
Error in seqVCF2GDS(vcf_fn, tmpf, verbose = TRUE, parallel = ntasks, scenario = gtype, :
The length should be >0.
Execution halted
The expected behavior would be to set missing INFO fields to the missing value for their type, e.g. bcftools:
VCF format specs allow variants to only list a subset of the INFO fields appearing in the header. In my VCF files, SNPs are optionally assigned
in the last INFO field.Most variants list an RSID (first line), but others do not (second line):
When I attempt to call seqVCF2GDS one these data, I get the following error:
The expected behavior would be to set missing INFO fields to the missing value for their type, e.g.