zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
44 stars 12 forks source link

how to give ploidy a non missing value in a dosage data set #86

Open nickychapman opened 1 year ago

nickychapman commented 1 year ago

Background: I've used imputedDosageFile to get my Beagle output in a format I can use with SeqArray. Then I used seqSNP2GDS to make a file for use with SeqVarTools. This object only includes dosage data, no genotypes. I am running into problems using GENESIS because it doesn't like that ploidy is NA. How can I tweak the object so ploidy is non missing? I think I probably need to put some dummy data in the genotype node but I can't figure that out.

Here is what my object looks like.

genodata Object of class "SeqVarGDSClass" File::panel_01_chr22.dose.array.gds (267.7K)

  • [ ] |--+ description [ ] |--+ sample.id { Str8 628 LZMA_ra(22.3%), 1.2K } |--+ variant.id { Int32 429 LZMA_ra(24.4%), 425B } |--+ position { Int32 429 LZMA_ra(79.4%), 1.3K } |--+ chromosome { Str8 429 LZMA_ra(7.30%), 101B } |--+ allele { Str8 429 LZMA_ra(22.0%), 385B } |--+ genotype [ ] |--+ phase [ ] |--+ annotation [ ] | |--+ id { Str8 429 LZMA_ra(24.0%), 393B } | |--+ qual { Float32 429 LZMA_ra(5.71%), 105B } | |--+ filter { Int32,factor 429 LZMA_ra(5.71%), 105B } | |--+ info [ ] | --+ format [ ] | --+ DS [ ] | --+ data { Float64 628x429 LZMA_ra(12.4%), 260.1K } * --+ sample.annotation [ ]
smgogarten commented 1 year ago

Currently SeqVarTools determines ploidy of a GDS file using the first dimension of the genotype node: https://github.com/smgogarten/SeqVarTools/blob/devel/R/AllUtilities.R#L20

Is there an alternate way to get the ploidy if that value is missing?

nickychapman commented 1 year ago

Here's some background to this predicament, in case that is where the problem lies:

I've been using imputedDosageFile (from GWASTools) and then seqSNP2GDS (from SeqArray) to go from a Beagle output file of dosages to a SeqVarGDS object.

In an older version of R (4.1.2), using GWASTools 1.40.0 and seqArray 1.34.0 this works fine and in the resulting object the ploidy is 2.

In a newer version of R (4.3.1), using GWASTools 1.46.6 and SeqArray 1.40.1, the resulting object has ploidy NA. This trips me up when I go to use assocTestSingle in GENESIS.

For now I am going to use the old version of R and proceed as I was before.

zhengxwen commented 1 year ago

Currently SeqVarTools determines ploidy of a GDS file using the first dimension of the genotype node: https://github.com/smgogarten/SeqVarTools/blob/devel/R/AllUtilities.R#L20

Is there an alternate way to get the ploidy if that value is missing?

In seqAlleleFreq(), if ploidy is NA (no genotype node, but a numeric dosage node), it will be set to be

ploidy <- getOption("seqarray.ploidy", 2L)

If the environment variable seqarray.ploidy is not defined, then just use 2 as a default value.

This might help.