Open mike-lawrence opened 1 year ago
Thanks, @mike-lawrence. That sounds good.
What we really need to do is move to comprehensions with partially evaluated GP autodiff! To the extent that we have to use primitive covariance matrix constructing functions and add them, we wind up with a huge expression graph for autodiff that limits scalability. We could whip something up now with roughly the right functionality (up to one extra fast copy that doesn't affect autodiff) using one of the parallel map
functions, because they do partial evaluation. It'd just be a fussy indexing mess.
(FYI, filed this issue partly as a reminder to myself to do the requisite edits)
What we really need to do ...
This bit pertains to enhancing Stan, and not the documentation thereof, correct?
This bit pertains to enhancing Stan, and not the documentation thereof, correct?
Both. It'd be nice if we had a matrix comprehension function, but in the meantime, one can be implemented using map_rect
. The advantage is that we never build a full covariance matrix of expressions, but instead evaluate each one's derivative on the fly. The parallelism is just gravy, but super helpful if you're running 4 chains on a 32 core server or even 2 chains on a 4 core server.
That is, we'd have something like map_rect
where the function being mapped is the covariance function. It has to compute entry n
in the covariance matrix. But it returns a vector, which then needs to be reshaped into a matrix for use. That's what I meant by being fiddly in terms of indexing. For an M x N matrix indexed from 1, entry (m, n) is in position
position(m, n, M, N) = (M * (m - 1)) + (n - 1) + 1.
So the inverse is fiddly.
Where this will make a big difference is in something like the "birthday problem" from the cover of BDA3, the covariance matrix for which is the sum of a sequence of other covariance matrix. So each entry has a large-ish expression graph. It'll also make the difference between doable and not doable when you're near memory limits.
But it returns a vector, which then needs to be reshaped into a matrix for use. That's what I meant by being fiddly in terms of indexing.
Ah, I did a non-parallel version of that here, where I relied on the user pre-computing variables to get the indexing right. When used serially, I found it roughly matched the built-in gp_exp_quad_cov()
(then called cov_exp_quad()
), I assumed thanks to black-magic C++ optimizations of the latter, but I see now that map_rect could be used to speed the computation of the covars
variable.
It also occurs to me that I should re-run that serial benchmark now to see if anything has changed in the last 6 years of optimizations (which could push the equivalence either way). Oh, and I think I know how to avoid the loops when assigning to the matrix now too.
We are going to be deprecating the #
comments, so I'd suggest switching to //
line comments.
The nice thing about cov_exp_quad()
is that we can write analytic gradients for the whole function. The nice thing about mapping is that it saves a lot of memory and improves memory locality. The ideal combination would have the special functions be functions that produce entries with analytic derivatives.
Anyway, this is probably too much for a user's guide chapter, but something that'd make a nice case study.
We are going to be deprecating the
#
comments, so I'd suggest switching to//
line comments.
Yup, for sure; that gist is 6 years old; a bunch of other updates to be made there too.
Chapter 10 of the Stan User Guide currently uses either custom code or the deprecated
cov_exp_quad()
wheregp_exp_quad_cov()
should be used. Also, might be worth showing the use of the the convenience functions from Aki's latest paper here and here