Closed dylkot closed 2 years ago
Thank you for your interest in GMMAT! Please try adding is.dosage = TRUE
in glmm.score
when you read the dosage from a GDS file.
Best, Han
I think that fixes it for the VCF file approach. It runs smoothly and returns results that are subtly different from the hard thresholded results and with signs flipped. The text file approach just runs really slowly and hasn't returned anything after an hour or two which doesn't really make sense since it is only about 150 variants (it completed in about 10 seconds for the VCF version). The first few columns/lines of the file are like below (and it is tab delimited):
1:723918[b37]G,A G A 2.0 2.0 2 1.0 1 2.0 2.0 2 2.0 2.0 2.0 2.0 2.0 2.0 1.0 2 2.0
1:724169[b37]A,G A G 2.0 2.0 2 2.0 2 2.0 2.0 2 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2 2.0
1:725286[b37]G,A G A 1.0 2.0 2 2.0 2 2.0 2.0 2 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2 1.0
1:729940[b37]C,G C G 1.0 2.0 1 2.0 2 2.0 2.0 1 1.0 2.0 2.0 2.0 2.0 2.0 1.0 2 1.15
1:730232[b37]C,T C T 2.0 2.0 2 2.0 2 2.0 2.0 2 1.0 2.0 2.0 2.0 2.0 2.0 1.0 2 2.0
Does that seem right to you?
Apologies for an unrelated add-on question, but it isn't really a github issue worth opening an issue for. But do you still recommend using principle components to correct for population stratification like you did in the GMMAT paper? I've seen a few other mixed linear models analyses where they don't include PCs assuming that that should be captured in the random effects, as long as you use the full set of variants to calculate the relatedness matrix rather than just a pruned subset.
Thanks so much!
You might need to skip the first three columns using infile.ncol.skip = 3
. See ?glmm.score
for details.
Regarding your second question, see Tucker, Price and Berger 2014. If there is strong confounding by population structure, I would always include top PCs as fixed-effects covariates (it probably does not matter if confounding is weak).
Best, Han
Got it, thank you!! Apologies, that wasn't very clear form section 4.3 here but I see it clearly in ?glmm.score
Got it, thank you!! Apologies, that wasn't very clear form section 4.3 here but I see it clearly in
?glmm.score
Also see an example of using glmm.score
with a plain text genotype file in Section 5.2.1 of the user manual.
Best, Han
Thank you, yes I totally missed that!
Best, Dylan
Hi there, thanks for this great tool! I am trying to use dosage information in my analysis. If I create a text file of the dosages, it seems I can't use parallelization (getting an error message saying parallelization isn't implemented for text files). Further, I am getting segfaults when I run the analysis:
gives
This is for 100 variants X 2000 samples. Alternatively if I try to use a VCF file like so:
it completes but doesn't seem to be using the dosage information. Any idea if it is possible to use the dosage information from the VCF file and if there is an obvious problem with the text file? Or any other work arounds to use dosage information, ideally with parallelization?
Thanks!