stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
179 stars 47 forks source link

Inclusion of "standard" covariates? #82

Closed auton1 closed 4 years ago

auton1 commented 5 years ago

Hi SuSiE,

This looks to be a very exciting package, and I'm keen to try it out in the context of GWAS fine-mapping. Thank you for making your work available.

I'm interested if there is any way to include a set of "standard" covariates in the GWAS context. Currently, if I understand correctly, SuSiE allows one to fit a model like $y=Xb+e$, and will identify non-zero effects and estimate credible set(s) across all variables in X. However, in the GWAS context, one may want to also include a set of covariates (such as age, sex, PCs, etc). Is there any way to do this in SuSiE? I could, for example, just include these covariates in the X matrix, but then I will be "wasting" my budget of non-zero effects when looking for credible sets.

Thanks again, and I hope my question makes sense.

Adam

gaow commented 5 years ago

@auton1 for quantitative trait, you can "remove" covariates beforehand by obtaining the residual in the regression analysis involving covariates only, then use that residual as input to susieR. For example,

y = residuals(lm(y~Z, na.action=na.exclude))

where Z is covariate matrix.

auton1 commented 5 years ago

Thanks. That makes sense. I also saw a comment that you're considering expanding SuSiE to handle logistic regression, where this approach wouldn't be an option. However, I guess I'll punt the question until logistic regression becomes an option :-)

Again, thanks.

stephens999 commented 5 years ago

Yes, just regress your fixed covariates out of both Y and genotypes and run Susie on the residuals.

This is a common question so maybe we should automate this ...

On Wed, Feb 13, 2019, 17:25 Adam Auton notifications@github.com wrote:

Hi SuSiE,

This looks to be a very exciting package, and I'm keen to try it out in the context of GWAS fine-mapping. Thank you for making your work available.

I'm interested if there is any way to include a set of "standard" covariates in the GWAS context. Currently, if I understand correctly, SuSiE allows one to fit a model like $y=Xb+e$, and will identify non-zero effects and estimate credible set(s) across all variables in X. However, in the GWAS context, one may want to also include a set of covariates (such as age, sex, PCs, etc). Is there any way to do this in SuSiE? I could, for example, just include these covariates in the X matrix, but then I will be "wasting" my budget of non-zero effects when looking for credible sets.

Thanks again, and I hope my question makes sense.

Adam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/82, or mute the thread https://github.com/notifications/unsubscribe-auth/ABt4xboxuOgy5nNyh-b8yGraUP3mSGJNks5vNJ7UgaJpZM4a6fFl .

stephens999 commented 5 years ago

Ahh, i did not see previous response....

My solution is slightly different because it involves regressing out of the genotypes too, not just y. I think this is more justified, although it may not make much difference in practice...

On Thu, Feb 14, 2019, 06:02 Matthew Stephens stephens999@gmail.com wrote:

Yes, just regress your fixed covariates out of both Y and genotypes and run Susie on the residuals.

This is a common question so maybe we should automate this ...

On Wed, Feb 13, 2019, 17:25 Adam Auton notifications@github.com wrote:

Hi SuSiE,

This looks to be a very exciting package, and I'm keen to try it out in the context of GWAS fine-mapping. Thank you for making your work available.

I'm interested if there is any way to include a set of "standard" covariates in the GWAS context. Currently, if I understand correctly, SuSiE allows one to fit a model like $y=Xb+e$, and will identify non-zero effects and estimate credible set(s) across all variables in X. However, in the GWAS context, one may want to also include a set of covariates (such as age, sex, PCs, etc). Is there any way to do this in SuSiE? I could, for example, just include these covariates in the X matrix, but then I will be "wasting" my budget of non-zero effects when looking for credible sets.

Thanks again, and I hope my question makes sense.

Adam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/82, or mute the thread https://github.com/notifications/unsubscribe-auth/ABt4xboxuOgy5nNyh-b8yGraUP3mSGJNks5vNJ7UgaJpZM4a6fFl .

pcarbo commented 5 years ago

@auton1 @gaow varbvs, which can be considered a predecessor to susieR, allows for additional covariates (in the "Z" input argument). The code for handling these covariates is quite straightforward, although it does introduce some subtleties in terms of interpreting the outputs:

remove.covariate.effects <- function (X, Z, y) {
  A   <- forceSymmetric(crossprod(Z))
  SZy <- as.vector(solve(A,c(y %*% Z)))
  SZX <- as.matrix(solve(A,t(Z) %*% X))

  # This should give the same result as centering the columns of X
  # and subtracting the mean from y when we have only one
  # covariate, the intercept.
  y <- y - c(Z %*% SZy)
  X <- X - Z %*% SZX

  return(list(X = X,y = y,SZy = SZy,SZX = SZX))
}

Note that in my case I included the intercept as one of the columns of "Z". (And when there are no covariates, Z is just a column vector of ones.) Hope that is helpful.

gaow commented 5 years ago

Thanks @pcarbo and @stephens999 for pointing out the subtle yet relevant difference. I'll adapt the code to a vignette (https://stephenslab.github.io/susieR/articles/finemapping.html) for a demonstration and point out the caveats.

maguileraf commented 10 months ago

@auton1 for quantitative trait, you can "remove" covariates beforehand by obtaining the residual in the regression analysis involving covariates only, then use that residual as input to susieR. For example,

y = residuals(lm(y~Z, na.action=na.exclude))

where Z is covariate matrix.

@gaow for a binary trait, how is it recommended to "remove" covariates?

gaow commented 10 months ago

@maguileraf for binary traits you may either treat it as quantitative traits (particularly when there are balanced 0 vs 1 responses) and apply the above suggestions, or, apply logistic regression to compute summary statistics accounting for covariates then use susie_rss() on those summary statistics.

maguileraf commented 10 months ago

@gaow thank you for your prompt response. I tried the second one with REGENIE and for some reason my LD matrix is not in concordance with the results from REGENIE, even though I used the same data as I used for REGENIE. Therefore, now I am trying to use susie() instead and see if it works. I do not have balanced 0 vs 1 though.

Jesson-mark commented 3 weeks ago

Hi, @gaow I have a similar question regarding to the covariates in fine-mapping. Since the covariates adjustment is performed on both genotype and phenotype which are used in susie function, is it needed to adjust covariates if using susie_rss with summary statistics?

I noticed that in fine-mapping with susie, the LD should be generated from the residuals of genotype after regressing out covariates. However, since fine-mapping with susie_rss requires both summary statistics and LD estimated from reference panel, do I need to also adjust genotype by covariates to estimate LD?

If so, this may be impractical because the genotype data may be not the same as that in summary statistics which do not have similar covariates. Besides this may be also impractical when summary statistics are generated in meta-analysis, where different cohorts may have different covariates.

Thanks! Best regards,

Jie Wang

gaow commented 3 weeks ago

Hi @Jesson-mark , in practice when we create reference LD panel we don't adjust covariates (as you said this is impractical). We find in practice the result in general make sense (enrichment, candidate loci etc) although we have not really assessed it formally.

Jesson-mark commented 3 weeks ago

Thanks for your quick reply! That really helps!

Besides, I have another related question about the covariates. I noticed that in some GWAS papers, they regress covariates out of phenotypes and perform GWAS on residuals with genotypes, as you mentioned abouve. However, I found that in some eQTL papers (like FastQTL), they regress covariates out of both genotypes and phenotypes (gene expressions), just as Stephen mentioned.

As Stephen had noted, both procedures may not have much difference in practice.

I'm poor in statistics so I cannot figure out the slight differences between these two procedures. So from the view of statistics (regression formula), which one do you think is more formal and why?

Best regards, Jie Wang

gaow commented 3 weeks ago

However, I found that in some eQTL papers (like FastQTL), they regress covariates out of both genotypes and phenotypes (gene expressions)

This is numerically equivalent to fitting y~genotype+covariates. It will not be the same as performing genotype on residual. I don;t have a reference note that I can immediately point to, but you can try to grab any regression example dataset, code up the procedures and see it for yourself the numerical identity / difference; or work out the algebra to show the connections.

Jesson-mark commented 3 weeks ago

Thanks for your suggesions! I'll have a try!