Closed AAvalos82 closed 5 years ago
What are "monomorphic heterozygotes"? Are you saying one REF and one ALT in a genotype? In SNPRelate, monomorphic SNPs refer to the SNPs with MAF=0.
Pardon the confusion, I just realized the oxymoron in "monomorphic heterozygote".
I see now what the issue is. As defined, snpgdsLDpruning() will observe MAF=0, excluding SNPs that have a genotype call of "0" or "2" across the entire sample set. However I have instances where the SNP is heterozygote across all the samples. Thus the site is polymorphic with a MAF=0.5, but the genotype is homogeneous across the entire set, and thus non-informative.
I retract the issue as these instances fall outside the scope of snpgdsLDpruning() utility, and will just have to be mindful that this is an exemption when using the tool.
Use SeqArray::seqSNP2GDS
to convert your SNPRelate dataset
require(SeqArray)
# x is your new converted seqarray dataset
variants <- SeqArray::seqGetData(gdsfile = x, var.name = "variant.id")
mono <- SeqArray::seqBlockApply(
gdsfile = x,
var.name = "$dosage",
FUN = function(x) apply(
X = x, MARGIN = 2 ,
FUN = function(x) length(unique(x[!is.na(x)])) == 1
),
margin = "by.variant",
as.is = "unlist",
parallel = 2,
.progress = FALSE)
variants.mono <- variants[mono]
Why not use seqApply()
:
library(SeqArray)
x <- seqOpen(seqExampleFileName("gds"))
variants <- seqGetData(gdsfile = x, var.name = "variant.id")
mono <- seqApply(x, "$dosage", function(g) all(g==1L, na.rm=T), parallel=2)
variants.mono <- variants[mono]
even better ! thanks
When I run the function above with SeqArray I receive this error.
Error in .DynamicClusterCall(cl, length(cl), .fun = function(.proc_idx, :
One of the nodes produced an error: Can not open file. The process cannot access the file because it is being used by another process.
I don't receive this error if I use parallel = 1 I am using a Windows 10 PC, with R 3.5.1 in Rstudio 1.1.447
Noticed this issue downstream of a GWAS analysis using the GENESIS package, but the remove.monosnp option of snpgdsLDpruning() seems to be ignoring heterozygote calls. Post LD prunning I am not encountering any monomorphic SNPS that are homozygote REF or ALT alleles but still ~500 monomorphic heterozygotes remain.