zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
43 stars 12 forks source link

Create GDS file from imputed data using dosage || No variable 'annotation/format/DS' in the FORMAT field. #71

Open complexgenome opened 3 years ago

complexgenome commented 3 years ago

Hi @zhengxw-ab

I am interested to create GDS file using VCF from imputed data. I would like to keep dosage information intact in this process. I use command as:

seqVCF2GDS("CHR22.recode.vcf.gz","check.gds",verbose=TRUE,genotype.var.name="annotation/format/DS",scenario=c("imputation"))

I get messages as:

 verbose=TRUE,genotype.var.name="annotation/format/DS",scenario=c("imputation"))
Wed Aug  4 13:14:47 2021
Variant Call Format (VCF) Import:
    file(s):
        CHR22_CHGWAS_rsq80_MAC10.recode.vcf.gz (1.8G)
    file format: VCFv4.1
    the number of sets of chromosomes (ploidy): 2
    the number of samples: 12,508
    genotype storage: bit2
    compression method: LZMA_RA
    # of samples: 12508
    scenario: imputation
        annotation/format/DS: packedreal16
        annotation/format/GP: packedreal16
No variable 'annotation/format/DS' in the FORMAT field.
Output:
    check.gds
Parsing 'CHR22_CHGWAS_rsq80_MAC10.recode.vcf.gz':

It says 'annotation/format/DS' in the FORMAT field.

How do I ensure to guide seqVCF2GDS function to pick dosage value?

Thanks,

zhengxwen commented 3 years ago

You misused "genotype.var.name", dosages are always stored in 'annotation/format/DS'. Remove ,genotype.var.name="annotation/format/DS"

complexgenome commented 3 years ago

@zhengxwen thank you for your reply.

Would following command be fine to tell seqVCF2GDS to use dosage values? SeqArray::seqVCF2GDS(vcf.fn,output_file, verbose=TRUE,scenario=c("imputation"))