Intercept, prediction function and fitted values

@william-denault It is important to output the intercept for several reasons, but one is that it is often helpful to be able to use the model to predict Y (e.g., to compare the observed Y with the predicted Y). This also provides a simple "sanity check" to make sure that the model fitting is working. You will notice that susieR has two related features: (i) it returns the "fitted" values (in fit$fitted) and (ii) it has a "predict" method. Here's an example of what I mean:

# library(susieR)
set.seed(1)
n <- 400
p <- 1000
beta <- rep(0,p)
beta[1:4] <- 1
X <- matrix(rnorm(n*p),nrow = n,ncol = p)
X <- scale(X,center = TRUE,scale = TRUE)
y <- drop(X %*% beta + rnorm(n))
fit <- susie(X,y,L = 10)
ypred <- predict(fit,X)
plot(y,fit$fitted,pch = 20,xlab = "true",ylab = "estimated")
abline(a = 0,b = 1,lty = "dotted",col = "magenta")
plot(y,ypred,pch = 20,xlab = "true",ylab = "estimated")
abline(a = 0,b = 1,lty = "dotted",col = "magenta")

(In this case, fit$fitted and ypred give the same result.)

I suggest implementing (i) and (ii) for fsusieR, which should be straightforward once you have the intercept.

Note that the intercept is not simply the the mean of Y, but the mean after removing the effects of the SNPs. See line 460 of susie.R for how this is implemented in susieR. The general formula looks something like this:

$$ (Z^TZ)^{-1} Z\bar{r}, \quad \bar{r} = y - X\bar{b} $$

For an intercept, Z is simply a column of ones.

Hi @pcarbo ,

It is less straightforward in this context. The main problem about fitted values/prediction is that to run fsusie, the Y matrix is expected to have 2^J columns, and when it is not the case, we actually remap the data using an interpolation scheme so that there are 2^J columns (see intro vignette section #### Handling function of any length and unevenly space data) . So the actual output of fsusie is a list of the fitted functions of length 2^J

So, the "real" predictions are actually made on this grid. How should we proceed? Should we make different "predict" functions, one that uses the estimate and the other one that outputs an "interpolated version" of the prediction on the original grid?

stephenslab / fsusieR

Intercept, prediction function and fitted values #11