zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
43 stars 12 forks source link

how to give ploidy a non missing value in a dosage data set #86

Open nickychapman opened 10 months ago

nickychapman commented 10 months ago

Background: I've used imputedDosageFile to get my Beagle output in a format I can use with SeqArray. Then I used seqSNP2GDS to make a file for use with SeqVarTools. This object only includes dosage data, no genotypes. I am running into problems using GENESIS because it doesn't like that ploidy is NA. How can I tweak the object so ploidy is non missing? I think I probably need to put some dummy data in the genotype node but I can't figure that out.

Here is what my object looks like.

genodata Object of class "SeqVarGDSClass" File::panel_01_chr22.dose.array.gds (267.7K)

  • [ ] |--+ description [ ] |--+ sample.id { Str8 628 LZMA_ra(22.3%), 1.2K } |--+ variant.id { Int32 429 LZMA_ra(24.4%), 425B } |--+ position { Int32 429 LZMA_ra(79.4%), 1.3K } |--+ chromosome { Str8 429 LZMA_ra(7.30%), 101B } |--+ allele { Str8 429 LZMA_ra(22.0%), 385B } |--+ genotype [ ] |--+ phase [ ] |--+ annotation [ ] | |--+ id { Str8 429 LZMA_ra(24.0%), 393B } | |--+ qual { Float32 429 LZMA_ra(5.71%), 105B } | |--+ filter { Int32,factor 429 LZMA_ra(5.71%), 105B } | |--+ info [ ] | --+ format [ ] | --+ DS [ ] | --+ data { Float64 628x429 LZMA_ra(12.4%), 260.1K } * --+ sample.annotation [ ]
smgogarten commented 10 months ago

Currently SeqVarTools determines ploidy of a GDS file using the first dimension of the genotype node: https://github.com/smgogarten/SeqVarTools/blob/devel/R/AllUtilities.R#L20

Is there an alternate way to get the ploidy if that value is missing?

nickychapman commented 10 months ago

Here's some background to this predicament, in case that is where the problem lies:

I've been using imputedDosageFile (from GWASTools) and then seqSNP2GDS (from SeqArray) to go from a Beagle output file of dosages to a SeqVarGDS object.

In an older version of R (4.1.2), using GWASTools 1.40.0 and seqArray 1.34.0 this works fine and in the resulting object the ploidy is 2.

In a newer version of R (4.3.1), using GWASTools 1.46.6 and SeqArray 1.40.1, the resulting object has ploidy NA. This trips me up when I go to use assocTestSingle in GENESIS.

For now I am going to use the old version of R and proceed as I was before.

zhengxwen commented 10 months ago

Currently SeqVarTools determines ploidy of a GDS file using the first dimension of the genotype node: https://github.com/smgogarten/SeqVarTools/blob/devel/R/AllUtilities.R#L20

Is there an alternate way to get the ploidy if that value is missing?

In seqAlleleFreq(), if ploidy is NA (no genotype node, but a numeric dosage node), it will be set to be

ploidy <- getOption("seqarray.ploidy", 2L)

If the environment variable seqarray.ploidy is not defined, then just use 2 as a default value.

This might help.