privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

How to code genotypes 0.01, 0.2, etc (NB these are not dosages but something else) as missing? #505

Closed RPorneso closed 2 weeks ago

RPorneso commented 2 weeks ago

Hi Florian, I managed to read in the .bed file in R using snp_attach. I noticed that the .bed file contains a "weird" genotype set. The guys who did the imputation and QC decided to do only hard calls but reading them in shows me there are some individuals with 0.01, 0.1, etc but nothing between 1 and 2 which tells me something went wrong. The code they used is below to create the .bed file (this was sent to me by the person who did the QC, their codes are not published online so I have to trust this is the code they executed to create the .bed file). I unfortunately don't have access to the .bgen file so I can't recreate the .bed file with dosage genotypes.

$PLINK2 --bgen $PREFIX.imputed.bgen --sample $PREFIX.imputed.chunk1.sample --oxford-single-chr ${SLURM_ARRAY_TASK_ID} --make-bed --out $PREFIX.imputed --memory 8000 --hard-call-threshold 0.3

As a last resort, I want to convert genotypes between 0 and 1 as missing. How do I accomplish this using bigsnpr? I can not use snp_fastImpute and snp_fastImputeSimple because plink$genotypes, while they are in FBM.code256, do not contain NAs.

Thank you.

privefl commented 2 weeks ago

What you have is a bed file, right? Then you read it with snp_readBed(). If you did that, you should have only 0s, 1s, 2s, and NAs.

RPorneso commented 2 weeks ago

I think it's working. I did snp_readBed() for chr22 and I see only 0, 1, 2! I will read in another chromosome just to be sure. I found the issue in chromosome 1 in another .rds/.bk file in our cluster. I am going to do snp_readBed() on that chromosome and confirm!

RPorneso commented 2 weeks ago

Yep, it's working. Thanks Florian for the quick response!