stan-dev / docs

Documentation for the Stan language and CmdStan
https://mc-stan.org/docs/
Other
38 stars 110 forks source link

Update GP chapter in Stan User Guide #614

Open mike-lawrence opened 1 year ago

mike-lawrence commented 1 year ago

Chapter 10 of the Stan User Guide currently uses either custom code or the deprecated cov_exp_quad() where gp_exp_quad_cov() should be used. Also, might be worth showing the use of the the convenience functions from Aki's latest paper here and here

bob-carpenter commented 1 year ago

Thanks, @mike-lawrence. That sounds good.

What we really need to do is move to comprehensions with partially evaluated GP autodiff! To the extent that we have to use primitive covariance matrix constructing functions and add them, we wind up with a huge expression graph for autodiff that limits scalability. We could whip something up now with roughly the right functionality (up to one extra fast copy that doesn't affect autodiff) using one of the parallel map functions, because they do partial evaluation. It'd just be a fussy indexing mess.

mike-lawrence commented 1 year ago

(FYI, filed this issue partly as a reminder to myself to do the requisite edits)

What we really need to do ...

This bit pertains to enhancing Stan, and not the documentation thereof, correct?

bob-carpenter commented 1 year ago

This bit pertains to enhancing Stan, and not the documentation thereof, correct?

Both. It'd be nice if we had a matrix comprehension function, but in the meantime, one can be implemented using map_rect. The advantage is that we never build a full covariance matrix of expressions, but instead evaluate each one's derivative on the fly. The parallelism is just gravy, but super helpful if you're running 4 chains on a 32 core server or even 2 chains on a 4 core server.

That is, we'd have something like map_rect where the function being mapped is the covariance function. It has to compute entry n in the covariance matrix. But it returns a vector, which then needs to be reshaped into a matrix for use. That's what I meant by being fiddly in terms of indexing. For an M x N matrix indexed from 1, entry (m, n) is in position

position(m, n, M, N) = (M * (m - 1)) + (n - 1) + 1.  

So the inverse is fiddly.

Where this will make a big difference is in something like the "birthday problem" from the cover of BDA3, the covariance matrix for which is the sum of a sequence of other covariance matrix. So each entry has a large-ish expression graph. It'll also make the difference between doable and not doable when you're near memory limits.

mike-lawrence commented 1 year ago

But it returns a vector, which then needs to be reshaped into a matrix for use. That's what I meant by being fiddly in terms of indexing.

Ah, I did a non-parallel version of that here, where I relied on the user pre-computing variables to get the indexing right. When used serially, I found it roughly matched the built-in gp_exp_quad_cov() (then called cov_exp_quad()), I assumed thanks to black-magic C++ optimizations of the latter, but I see now that map_rect could be used to speed the computation of the covars variable.

It also occurs to me that I should re-run that serial benchmark now to see if anything has changed in the last 6 years of optimizations (which could push the equivalence either way). Oh, and I think I know how to avoid the loops when assigning to the matrix now too.

bob-carpenter commented 1 year ago

We are going to be deprecating the # comments, so I'd suggest switching to // line comments.

The nice thing about cov_exp_quad() is that we can write analytic gradients for the whole function. The nice thing about mapping is that it saves a lot of memory and improves memory locality. The ideal combination would have the special functions be functions that produce entries with analytic derivatives.

Anyway, this is probably too much for a user's guide chapter, but something that'd make a nice case study.

mike-lawrence commented 1 year ago

We are going to be deprecating the # comments, so I'd suggest switching to // line comments.

Yup, for sure; that gist is 6 years old; a bunch of other updates to be made there too.