Closed jonathanstrong closed 2 years ago
The main bottleneck is that, given some parameters, fitting a gaussian process is a O(n^3)
operation (where n
is the number of samples). Now if you are fitting the parameters, you will pay that price at every iteration of the fitting process...
This is mostly unavoidable (and one reason why people do not use GPR when they have very large datasets) but you could speed things up:
I am not aware of any deep inefficiency in the code (but I might have missed something), the only low hanging fruit would be the possibility to reduce memory use. If you want to go further, one possibility is to intoduce a more efficient solver in place of the Cholesky decomposition (a conjugate gradient might be able to reuse a previous solution when fitting the parameters, speeding up the resolution, however that would be a non-trivial modification of the code and the benefits are unproved).
thank you very much for the prompt and thorough response @nestordemeure - hugely appreciated!
regarding nalgebra/blas - I noticed that during fitting only one cpu core was busy - do you think ensuring blas is used as backend for nalgebra would result in more parallel work? are there any places where the algorithms (i.e. not matmul or other specific math ops) are embarrassingly parallel but the implementation is currently single threaded?
(also - small aside on nalgebra - I have used ndarray over the past few years and not nalgebra - is there any functionality in nalgebra you need or was that just the library you chose?)
regarding kernel regression: is your link of Priestley–Chao kernel estimator meant to imply that Priestley–Chao is especially fast (performance-wise), especially good (quality wise, as compared to GPR), or neither/both? do you know of any existing open source implementations (i.e. code, not papers) of Priestley–Chao?
do you think ensuring blas is used as backend for nalgebra would result in more parallel work?
Blas (instead of a raw Rust implementation) would be much faster (you can expect one order of magnitude). From memory Friedrich is written in such a way that the best and easiest place to put the paralelism would be linear algebra calls (but you would have to check the nalgebra documentation to see how to enable a parallel blas).
is there any functionality in nalgebra you need or was that just the library you chose?
When I started Friedrich, I played with both ndarray and nalgebra but converged on nalgebra as I missed some functionalities in ndarray (I don't remember which ones however). Nowadays, I have contributed some code to nalgebra specifically for Friedrich (update of a Cholesky decomposition and, coming soonish, an unfailible cholesy decomposition algorithm) so I am even further tied with it (but going from it to ndarray would be definitely be doable).
is your link of Priestley–Chao kernel estimator meant to imply that Priestley–Chao is especially fast (performance-wise), especially good (quality wise, as compared to GPR), or neither/both?
It is much faster for algorithmic reasons (the algorithm is O(n)
) and should be good given enough data-points (I have further ideas to tweak it and give it some good uncertainty estimation but not enough time to implement them right now and, even less, to publish these properly as a whitepaper). However, it will probably be inferior to GPR.
do you know of any existing open source implementations
I do not but the regression part of the code would be extremely straightforward to implement given the kernel implementations already present in Friedrich (basically do an average of the output values weighted by the kernel values and divided by the sum of kernel values, add a prior to avoid pathological behaiour far from the known points and you will be fine). The uncertainty estimation would be harder (that is were there is some interesting work to be done).
as someone without a ton of experience using GPR, I was surprised how long it takes to fit models.
is it possible you could provide some insight to 1) ways of using this library that are faster than others, 2) the extent to which you estimate the code could be optimized, 3) what parts of the code you consider most in need of optimization?
thanks for any help you can provide!