Open timflutre opened 6 years ago
@timflutre Yes, I think this is possible, although I would have to redo most of the derivations, so it isn't trivial. Is this motivated by a specific project you are working on?
Basically everybody around me is doing such two-stage analyses, so the literature is quite large. Moreover, here is a recent preprint investigating around this. We can easily understand why it is a common scenario: fitting mixed model with "sophisticated" error covariances in addition to a large (huge) number of genetic markers as predictors is impossible with current softwares, especially when one want to perform variable selection on the genetic markers, hence the analysis being done in two steps.
I thought it would be as simple as WLS (weighted least squares) compare to simple LS, given that the weights are known. But I trust you on assessing the impact on varbvs much better than me!
@timflutre Thanks for the background. It isn't conceptually or technically difficult to do, but would involve reworking a substantial portion of the code. I will keep this mind and return to it when I have a chance.
Ok, no worries, thanks!
@timflutre If the weights are w_1, ..., w_n, I think you can premultiply X (markers) and Y (BLUP) by diag(sqrt(w_1), ..., sqrt(w_n)) and apply varbvs on the transformed data (based on properties of GLS)
@aksarkar Thanks for this suggestion. So perhaps it is simpler to implement than I thought. Do you happen to know if the likelihood is correct after this transformation? I haven't worked through the math, but it seems like there may be an additional factor that needs to be included in the likelihood after the transformation.
The assumption in GLS is y ~ N(X beta, Sigma). Letting LL* = Sigma, L^-1 y ~ N(L^-1 X beta, I). So the likelihood should be correct.
The assumption in WLS is that Sigma is diag(w_1, ..., w_n), so L is diag(sqrt(w_1), ..., sqrt(w_n)). So my suggestion above is not quite correct (take the reciprocal of the diagonal entries).
@aksarkar Do you need this option as well?
@timflutre @aksarkar varbvs
version 2.5-2 now has a weights
option (for family = "gaussian" only
), as well as a resid.vcov
option to allow for residuals with a specified covariance matrix (e.g., a kinship matrix).
Thanks @pcarbo !
However, when I launch varbvs
with weights, it looks like the summary
function doesn't report the proportion of variance explained anymore. Is it on purpose?
@timflutre Yes, I haven't yet updated the PVE calculations to allow for weighted samples. Do you need the PVE estimates?
That would be great! I would expect the PVE to increase when weights are used. It is also interesting to look at the PVE to get a sense of whether or not the SNPs that are available are numerous enough and well distributed to tag all LD blocks. In my case, I am not sure of that because I only have 10k SNPs and the pairwise LD quickly becomes small for short distances. I am currently in the process of genotyping more SNPs by sequencing. So I would expect the PVE to increase when using, say, 60k SNPs compare to only 10k.
@timflutre Okay, I've reopened this issue and will work out the PVE calculations for weights samples.
Hi Peter,
In practice, it is frequent to perform a two-stage analysis:
get one BLUP per individual via a somewhat sophisticated mixed model (e.g. to account for spatial and/or temporal covariance in errors);
use them in a regression on marker genotypes to find QTLs.
However, some individuals may have more records than others (due to missing data or to a specific experimental design). This is reflected in the variance of the BLUPs. One then often want to use thess variances in stage 2 as weights (see the option in
stats::lm
andlme4::lmer
for instance; it is also available in GEMMA).Could a
weights
option be implemented invarbvs
?Thanks in advance, Tim