Linear Functionals of a Gaussian Process

apeterson91 commented 1 year ago

Hey Paul,

I'm wondering if there are any plans to implement linear functionals of gaussian process as is implemented with mgcv::s() terms? If you don't have any plans to work on this, I might take a stab at it and would like any feedback you might have on how that design might be implemented. I'm especially interested in making this work for the gp approximation.

paul-buerkner commented 1 year ago

This is not support yet, but could be an interesting addition. I don't have any plans on working on this myself at the moment. So feel free to work on this. For a start, how does the corresponding s() syntax look?

apeterson91 commented 1 year ago

From the mgcv docs:

The mechanism by which this is achieved is to supply matrices of covariate values to the model smooth terms specified by s or te terms in the model formula. Each column of the covariate matrix gives rise to a corresponding column of predictions from the smooth.

Equivalently:

n<-400
sig<-2
x <- runif(n, 0, .9)
f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
x1 <- x + .1

f <- f2(x) + f2(x1)  ## response is sum of f at two adjacent x values 
y <- f + rnorm(n)*sig

X <- matrix(c(x,x1),n,2) ## matrix covariate contains both x values
b <- gam(y~s(X))

The one "bad" thing about this approach is that it doesn't allow for a ragged array type of data structure – it assumes there is the same number to sum over for each observation. I'm wondering if we could use the "id" argument in the gp() function and an additional new argument sum = TRUE (for example) to sum up all evaluations of the function within that id group?

E.g. if the data structure is	id	y	x
1	0	0.2	1
1	0.5	0.3	0
2	0.2	0.4	0

And the formula was

brm(y ~ z + gp(x, by='id', sum = TRUE), data = data)

That would correspond to

$$ E[y_1] = \alpha + \beta \times z + f(0.2) + f(0.3) $$

$$ E[y_2] = \alpha + f(0.4)\ $$

$$ f \sim GP(0, \kappa(\cdot, \cdot)). $$

Maybe that's a bit clumsy but that's my first though at how things might work.

paul-buerkner commented 1 year ago

I don't think the id approach would be so great because it breaks the principle of each row in the data having a likelihood contribution (suddenly two rows are supposed to belong to the same y). Another option would be to allow for NAs in the matrix X above, where NAs would just be ignored. That way, we would allow for some kind of ragged structure.

apeterson91 commented 1 year ago

Makes sense.

Do you want the matrix to be provided as an argument to the brm() function? For example, mgcv has X in the global environment and that's where there function ends up referencing it I believe.

paul-buerkner commented 1 year ago

I think X can just be a matrix column in the data.frame passed to the data argument.

Adam @.***> schrieb am Do., 27. Apr. 2023, 03:39:

Makes sense.

Do you, as a general principle want the matrix to be provided as an argument to the brm() function? For example, mgcv has X in the global environment and that's where there function ends up referencing it I believe.

— Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/1484#issuecomment-1524405668, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AHTWWPLOQIOGRHREEDXDHE4ZANCNFSM6AAAAAAXH3OWCY . You are receiving this because you commented.Message ID: @.***>

apeterson91 commented 1 year ago

Hey Paul,

Currently weighing a design choice that I'd like to run by you. From our previous back-and-forth:

Another option would be to allow for NAs in the matrix X above, where NAs would just be ignored.

This would mean that we'd need to introduce new logic into your na_omit() and vars_keep_na() functions; pass the object (data from the brm function) argument from na_omit() through to vars_keep_na() and evaluate the gaussian process predictors to see which are matrices with a specific pattern of missing to decide whether or not to keep them.

My main concern here is computational complexity, as evaluating the matrices twice – once to figure out NAs and a second time to actually structure them for stan – is less than ideal for larger matrices. This is also a larger change to the codebase I'd rather avoid if possible.

In contrast I'm wondering if you'd have any problem with requiring sparse matrices be used. This would have the advantage of not having to touch any of the missing data functions and a more intuitive handling of the data structure.

Let me know your thoughts.

paul-buerkner commented 1 year ago

I am not so concerned about computational efficiency here. Usually, all the computations that happen inside Stan will dominate whatever we do in R as pre/post-processing.

That said, we can of course use sparse matrics (via the Matrix package). However, do you want to distinquish NAs from actual zero data?

apeterson91 commented 1 year ago

Yes, I'm thinking we establish the convention for this functionality that the NA's be the sparse component and 0's still be entered as normal.

e.g.

i <- c(1:10, c(1,3,8,4,2,7,6,9,1,4,10));
j <- c(rep(1,10),c(2,5,3,2,4,2,4,5,2,7,3)); 
x <- rpois(21,2)
M <- Matrix::sparseMatrix(i,j,x=x)

10 x 7 sparse Matrix of class "dgCMatrix"

[1,] 0 5 . . . . . [2,] 5 . . 2 . . . [3,] 1 . . . 3 . . [4,] 0 2 . . . . 1 [5,] 2 . . . . . . [6,] 2 . . 2 . . . [7,] 1 4 . . . . . [8,] 4 . 1 . . . . [9,] 4 . . . 1 . . [10,] 2 . 2 . . . .

Does that make sense?

paul-buerkner commented 1 year ago

Yes, that makes sense. However, I would not require the user to pass sparse matrices, I would simply allow it to work in addition to regular matrices with NAs.

apeterson91 commented 1 year ago

If we allow the regular matrices with NAs then we still have the problem previously stated in regards to the larger changes to the codebase. Could we distinguish between the two cases:

Dense input with no NA's - regular matrix type.
Sparse input with possible NA's – sparse matrix type.

As supported by the package? My concern with supporting the dense input with NAs is as stated previously – I'll have to introduce larger changes into your NA handling functions then I'd otherwise prefer.

paul-buerkner commented 1 year ago

I see. Yes makes sense. You can start with the sparse matrix version.

In which cases would we have a fully dense matrix? If all observations use the same number if input points, right?

apeterson91 commented 1 year ago

Yes exactly. This is the only case considered by the mgcv package, for example.

paul-buerkner commented 1 year ago

okay let do it this way then with the discussed 2 options, without supporting NAs

Adam @.***> schrieb am So., 21. Mai 2023, 15:11:

Yes exactly. This is the only case considered by the mgcv package, for example.

— Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/1484#issuecomment-1556176952, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AAOTTCRPP4KFD3BA7TXHIIABANCNFSM6AAAAAAXH3OWCY . You are receiving this because you commented.Message ID: @.***>

paul-buerkner / brms

Linear Functionals of a Gaussian Process #1484