Due to the need to invert the kernel matrix K, fitting a GPR model scales as $\mathcal{O}(n^3)$. There are a variety of strategies to get around this including the Bayesian Committee Machine which splits the training set up into random subsets of size M<N with individual GPs trained on each.
We could easily implement this, and since hyperparameters are shared between each of the committee members, the gradient calculation step can be parallelized. See this python version for implementation ideas.
Due to the need to invert the kernel matrix
K
, fitting a GPR model scales as $\mathcal{O}(n^3)$. There are a variety of strategies to get around this including the Bayesian Committee Machine which splits the training set up into random subsets of sizeM<N
with individual GPs trained on each.We could easily implement this, and since hyperparameters are shared between each of the committee members, the gradient calculation step can be parallelized. See this python version for implementation ideas.