Closed Svalorzen closed 4 years ago
Thanks for reporting. I'll take a look today.
The reason is that the gradient was incorrectly calculated.
Since you have added a diagonal noise to K
, the derivatives of l
and sigma
need to be computed before K
is added the noise.
Other tweaks:
makeDist(xData_, xData_)
outside kernel()
and operator()
, since it stays the same.K
to compute other related quantities, for example K^(-1)
, K^(-1) * y
, and log|K|
.y.transpose() * kinv * dkl * kinv * y
, do
alpha = kinv * y
alpha.transpose() * dkl * alpha
tr(A * B) = sum(A .* B)
, where .*
is the elementwise multiplication. The former is O(n^3)
, whereas the latter is O(n^2)
.Below is my version:
#include <iostream>
#include <Eigen/Core>
#include <Eigen/Cholesky>
#include <LBFGSB.h>
#include <LBFGS.h>
constexpr double PI = 3.141592653589793238462643383279502884197169399375105820974944;
using Matrix2D = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor | Eigen::AutoAlign>;
using Vector = Eigen::Matrix<double, Eigen::Dynamic, 1>;
Matrix2D makeDist(const Matrix2D & x1, const Matrix2D & x2) {
return ((-2.0 * (x1 * x2.transpose())).colwise() + x1.rowwise().squaredNorm()).rowwise() +
x2.rowwise().squaredNorm().transpose();
}
Matrix2D kernel(const Matrix2D & dist, double l = 1.0, double sigma = 1.0) {
return std::pow(sigma, 2) * ((-0.5 / std::pow(l, 2)) * dist).array().exp();
}
struct Likelihood {
Likelihood(const Matrix2D & xData, const Vector & muData, double noise) :
dist_(makeDist(xData, xData)), muData_(muData), noise_(noise) {}
const Matrix2D dist_;
const Vector & muData_;
double noise_;
double operator()(const Eigen::VectorXd& x, Eigen::VectorXd& grad) {
const int N = muData_.size();
const double l = x[0];
const double sigma = x[1];
Matrix2D k = kernel(dist_, l, sigma);
// Compute derivatives as per
// https://math.stackexchange.com/questions/1030534/gradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co
Matrix2D dkl = k.cwiseProduct(dist_) / std::pow(l, 3);
Matrix2D dks = (2.0 / sigma) * k;
k.diagonal().array() += std::pow(noise_, 2);
auto llt = Eigen::LLT<Matrix2D>(k);
Matrix2D kinv = llt.solve(Matrix2D::Identity(N, N));
Vector alpha = llt.solve(muData_);
grad[0] = 0.5 * alpha.transpose() * dkl * alpha - 0.5 * kinv.cwiseProduct(dkl).sum();
grad[1] = 0.5 * alpha.transpose() * dks * alpha - 0.5 * kinv.cwiseProduct(dks).sum();
// Minimize negative log-likelihood
grad = -grad;
// Compute log likelihood (correct)
const double logLikelihood = -0.5 * muData_.dot(alpha) - llt.matrixL().selfadjointView().diagonal().array().log().sum() - 0.5 * N * std::log(2.0 * PI);
// Print current parameters
std::cout << l << ", " << sigma << " ==> " << -logLikelihood << " - " << grad.transpose() << std::endl;
// Return negative log-likelihood
return -logLikelihood;
}
};
int main() {
const double noise = 0.4;
// Some test points to train on.
Matrix2D X_train(6, 1);
X_train << -5, -4, -3, -2, -1, 1;
Vector Y_train = X_train.array().sin();
// Log-likelihood function with set data/noise
Likelihood like(X_train, Y_train, noise);
// Bounds
Vector lb = Vector::Constant(2, 1e-8);
Vector ub = Vector::Constant(2, std::numeric_limits<double>::infinity());
// Initial guess
Vector x = Vector::Constant(2, 1.0);
// L-BFGS-B
LBFGSpp::LBFGSBParam<double> param;
param.epsilon = 1e-5;
param.max_iterations = 100;
LBFGSpp::LBFGSBSolver<double> solver(param);
double log;
solver.minimize(like, x, log, lb, ub);
std::cout << "\n===============================\n\n";
LBFGSpp::LBFGSParam<double> paramx;
paramx.epsilon = 1e-5;
paramx.max_iterations = 10;
LBFGSpp::LBFGSSolver<double> solver2(paramx);
x = Vector::Constant(2, 1.0);
solver2.minimize(like, x, log);
return 0;
}
Output:
1, 1 ==> 6.49954 - -1.64172 2.33992
1.85404, 0.479791 ==> 7.35173 - 2.13605 -6.82365
1.3421, 0.791619 ==> 5.88876 - -0.101547 0.295286
1.36368, 0.760786 ==> 5.88481 - 0.0601469 -0.0763338
1.35529, 0.76696 ==> 5.88431 - 0.0152388 0.0146282
1.35236, 0.76593 ==> 5.88426 - 0.00972153 0.0133352
1.34553, 0.762505 ==> 5.8842 - -5.29077e-06 -0.000405903
1.34559, 0.762563 ==> 5.8842 - 1.4304e-06 1.52874e-06
===============================
1, 1 ==> 6.49954 - -1.64172 2.33992
1.57435, 0.181389 ==> 10.1205 - 1.09806 -14.218
1.28717, 0.590695 ==> 6.05536 - 0.38649 -2.43346
1.21589, 0.792321 ==> 5.92615 - -0.488156 0.681163
1.25462, 0.747928 ==> 5.89378 - -0.233815 0.12984
1.28602, 0.735536 ==> 5.88844 - -0.0968389 -0.107242
1.3083, 0.741189 ==> 5.88623 - -0.045728 -0.113439
1.34315, 0.760708 ==> 5.88421 - -0.00147428 -0.0116414
1.34548, 0.76243 ==> 5.8842 - 0.000102842 -0.00104711
1.3456, 0.762561 ==> 5.8842 - 2.875e-05 -4.39889e-05
1.34559, 0.762563 ==> 5.8842 - 3.86423e-06 -3.34703e-06
Sorry for my mistake, I hope you haven't lost too much time on it. Thanks a ton for the help!
Not a big deal. It was actually a good time to refresh my memory on GP fitting. :joy:
Hi, I'm trying to estimate the parameters of a GP with a standard RBF kernel. I'm having some trouble converging to the same parameters I get when implementing the code in Python. I am using the current master branch (not sure if I should be using a particular commit).
With
LBFGSSolver
the search seems to never end, going in loops. WithLBFGSBSolver
, after around ~20 iterations the search ends with a thrown exception. I've tried modifying the parameters to make it look longer, but it never really converges more than what it already achieved.LBFGSBSolver
gets somewhat close to the optimal solution found via Python (1.3455902517905933 0.7625631641671134
with the negative log likelihood at5.8842014
), but since it then throws I don't want to simply use a try-catch if it's not stopping cleanly (since maybe it could actually get to the same solution as in Python).I've double checked my gradients, and I hope I haven't made silly mistakes. Could you help me figure out if there's something I should be doing differently? Here is my code, if it helps: