zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
43 stars 12 forks source link

error when running seqVCF2GDS #81

Open maxineliu opened 2 years ago

maxineliu commented 2 years ago

I have a VCF file that consists of structural variant records. I'm trying to convert vcf to gds, but failed.

The command I used is: seqVCF2GDS("/Users/path/12bufo.vcf.gz", "12bufo.gds", storage.option="ZIP_RA", parallel=8L)

The error msg is:

Fri Aug 19 02:45:43 2022
Variant Call Format (VCF) Import:
    file(s):
        12bufo.vcf.gz (1.0G)
    file format: VCFv4.2
    genome reference: <unknown>
    the number of sets of chromosomes (ploidy): 2
    the number of samples: 12
    genotype storage: bit2
    compression method: ZIP_RA
    # of samples: 12
    calculating the total number of variants ...
    # of variants: 3,036,302
    >>> writing to 8 files: <<<
        12bufo_tmp01_1570f70c78cda  [1 .. 379,544]
        12bufo_tmp02_1570f3cf0760e  [379,545 .. 759,088]
        12bufo_tmp03_1570f4ed6b063  [759,089 .. 1,138,632]
        12bufo_tmp04_1570f70d65c04  [1,138,633 .. 1,518,176]
        12bufo_tmp05_1570f933447c   [1,518,177 .. 1,897,720]
        12bufo_tmp06_1570f4d52d9c   [1,897,721 .. 2,277,264]
        12bufo_tmp07_1570f47a55f3e  [2,277,265 .. 2,656,808]
        12bufo_tmp08_1570f3a100431  [2,656,809 .. 3,036,302]
Error in .DynamicForkCall(njobs, njobs, .fun = function(.jobidx, ...) { : 
  Error in seqVCF2GDS(vcf.fn, ptmpfn[i], header = oldheader, storage.option = storage.option,  : 
  Invalid float conversion 'None,0,8,8,8'
FILE: /Users/maxineliu/work/bufo/vcf_analysis.dir/12bufo.vcf.gz
LINE: 794, COLUMN: 8, PRECISE;SVTYPE=BND;SUPPORT=3;COVERAGE=None,0,8,8,8

So I modified my command: seqVCF2GDS("/Users/path/12bufo.vcf.gz", "12bufo.gds", storage.option=seqStorageOption(float.mode="float64"), parallel=8L)

Return changed but still has errors:

Fri Aug 19 02:53:04 2022
Variant Call Format (VCF) Import:
    file(s):
        12bufo.vcf.gz (1.0G)
    file format: VCFv4.2
    genome reference: <unknown>
    the number of sets of chromosomes (ploidy): 2
    the number of samples: 12
    genotype storage: bit2
    compression method: customized
    # of samples: 12
    calculating the total number of variants ...
    # of variants: 3,036,302
    >>> writing to 8 files: <<<
        12bufo_tmp01_1570f54acdfc   [1 .. 379,544]
        12bufo_tmp02_1570f72195e1a  [379,545 .. 759,088]
        12bufo_tmp03_1570f5f71377b  [759,089 .. 1,138,632]
        12bufo_tmp04_1570f1f19d31   [1,138,633 .. 1,518,176]
        12bufo_tmp05_1570f1d86fcf6  [1,518,177 .. 1,897,720]
        12bufo_tmp06_1570f949899f   [1,897,721 .. 2,277,264]
        12bufo_tmp07_1570f3aea2a7c  [2,277,265 .. 2,656,808]
        12bufo_tmp08_1570f638b511b  [2,656,809 .. 3,036,302]
Error in .DynamicForkCall(njobs, njobs, .fun = function(.jobidx, ...) { : 
  Error in match.arg(scenario) : 
  'arg' should be one of “general”, “imputation”

What should I do to avoid this kind of error?

Thanks in advance, Maxine

zhengxwen commented 2 years ago

Cannot recognize None in COVERAGE=None,0,8,8,8. None should be replaced by . in the VCF file!