mlpack / ensmallen

A header-only C++ library for numerical optimization --
http://ensmallen.org
Other
742 stars 120 forks source link

Report callback fails when optimizer has no iterations #246

Closed benmarchi closed 3 years ago

benmarchi commented 3 years ago

Issue description

I'm using ensmallen via rcppEnsmallen. I've noticed this issue while using the L-BFGS solver. The issue is that the optimization process crashes my R session when the optimization quits without taking any steps while using the built-in report callback (likely due to bad initial parameter estimates). The optimizer does not crash the R session when the report callback is not used. The crash occurs while printing the report, which leads me to believe that there is something within the callback causing the issue. My guess is that the callback assumes that the number of iterations is greater than 0, which would not be the case when the optimizer fails to take a step.

Your environment

Steps to reproduce

Unfortunately, I don't have a reproducible example. My environment is also a bit locked, so I can't readily test any fixes.

Expected behavior

Report to print with full details.

Optimization Report
--------------------------------------------------------------------------------

Initial coordinates: 
-0.4538 -0.3340  1.1227  ... -0.2663

Final coordinates: 
-0.2655 -0.3284  1.1241  ... -0.1279
iter          loss          loss change   |gradient|    total time    
0             11135977605.1090.000         25055627432.5180.006         

--------------------------------------------------------------------------------

Version:
ensmallen:                    2.14.2 (No Direction Home)
armadillo:                    10.1.0 (Orchid Ambush)

Function:
Coordinates rows:             100
Coordinates columns:          1

Loss:
Initial                       11135977605.109
Final                         11135977605.109
Change                        0.000

Optimizer:
Iterations:                   1
Coordinates max. norm:        25055627432.518
Evaluate calls:               2
Gradient calls:               2
Time (in seconds):            0.006

Actual behavior

Example bad report output with R session crash:

Optimization Report
--------------------------------------------------------------------------------

Initial coordinates: 
 0.0000  0.0000  0.0000  ... 0.0000

Final coordinates: 
 0.2617  0.0238  0.0064  ... -0.0704
iter          loss          loss change   |gradient|    total time    

--------------------------------------------------------------------------------

Version:
ensmallen:               
zoq commented 3 years ago

The callback should run fine if the number of iterations is <= 0: https://github.com/mlpack/ensmallen/blob/cdfae670da608ef8498dee1e649f79f1d3279ef9/include/ensmallen_bits/callbacks/report.hpp#L120

Let's see if I can reproduce the issue somehow.

benmarchi commented 3 years ago

There definitely could be something wrong on my end as well. I'll continue to play around to see if I can nail down the issue.

benmarchi commented 3 years ago

I was able to create an example that consistently crashes when generating a report. The example calls ensmallen from R, so I don't know if that is part of the issue.

c++ code ```cpp #include // [[Rcpp::depends(RcppEnsmallen)]] class GLM { public: // Construct the object with the given the design // matrix and responses. GLM(const arma::mat& X, const arma::colvec& y, const arma::colvec& weights, const arma::colvec& offset) : X(X), y(y), weights(weights), offset(offset) { } // Return the objective function for model parameters beta. double EvaluateWithGradient(const arma::mat& beta, arma::mat& g) { const arma::colvec eta = X * beta + offset; const arma::colvec mu = arma::exp(eta); const arma::colvec diff = (y - mu); g = -2.0 * X.t() * (weights % diff % arma::exp(eta)); return arma::accu(weights % arma::square(diff)); } private: const arma::mat& X; const arma::colvec& y; const arma::colvec& weights; const arma::colvec& offset; }; // [[Rcpp::export]] arma::mat log_glm_lbfgs(const arma::mat& X, const arma::colvec& y, const arma::colvec& weights, const arma::colvec& offset, const int& lineIter) { GLM glm(X, y, weights, offset); ens::L_BFGS lbfgs; lbfgs.MaxIterations() = 100; lbfgs.MaxLineSearchTrials() = lineIter; arma::mat beta(X.n_cols, 1, arma::fill::zeros); lbfgs.Optimize(glm, beta, ens::Report(0.1)); return beta; } // [[Rcpp::export]] arma::mat log_glm_lbfgs_noLog(const arma::mat& X, const arma::colvec& y, const arma::colvec& weights, const arma::colvec& offset, const int& lineIter) { GLM glm(X, y, weights, offset); ens::L_BFGS lbfgs; lbfgs.MaxIterations() = 100; lbfgs.MaxLineSearchTrials() = lineIter; arma::mat beta(X.n_cols, 1, arma::fill::zeros); lbfgs.Optimize(glm, beta); return beta; } ```
R Code ```R library(RcppEnsmallen) library(Rcpp) library(RcppArmadillo) Sys.setenv("PKG_CXXFLAGS" = "-fopenmp -O3") Sys.setenv("PKG_LIBS" = "-fopenmp -O3") sourceCpp("/path/to/cpp", rebuild = TRUE) set.seed(1234) m <- 10 n <- 1e4 X <- matrix(rnorm(n*m, 0, 1), ncol = m) X[,1] <- 1 k <- floor(m*0.5)-1 X[,2:(k+1)] <- matrix(rbinom(n*k, 1, 0.1), ncol = k) beta <- rnorm(m,0,0.5) off <- rnorm(nrow(X), 1, 1) y <- drop(X %*% beta + off + rnorm(nrow(X))) y <- ifelse(abs(y) > 14, 14, y) yLog <- exp(y) wgt <- runif(nrow(X), 0.1, 2) log_glm_lbfgs_noLog(X, yLog, wgt, off, 10) ### Runs successfully log_glm_lbfgs(X, yLog, wgt, off, 10) ### Crashes R session ```
rcurtin commented 3 years ago

Thank you for the clear report with how to reproduce! In order to reproduce specifically from C++, I adapted the code you wrote a little:


```#include <ensmallen.hpp>

class GLM
{
public:
    // Construct the object with the given the design~
    // matrix and responses.
    GLM(const arma::mat& X,
        const arma::colvec& y,
        const arma::colvec& weights,
        const arma::colvec& offset) :
    X(X), y(y), weights(weights), offset(offset) { }

    // Return the objective function for model parameters beta.
    double EvaluateWithGradient(const arma::mat& beta, arma::mat& g) {
        const arma::colvec eta = X.t() * beta + offset;
        const arma::colvec mu = arma::exp(eta);
        const arma::colvec diff = (y - mu);
        g = -2.0 * X * (weights % diff % arma::exp(eta));
        return arma::accu(weights % arma::square(diff));
    }

private:
    const arma::mat& X;
    const arma::colvec& y;
    const arma::colvec& weights;
    const arma::colvec& offset;
};

// [[Rcpp::export]]
arma::mat log_glm_lbfgs(const arma::mat& X, const arma::colvec& y, const arma::colvec& weights, const arma::colvec& offset, const int& lineIter) {

    GLM glm(X, y, weights, offset);
    ens::L_BFGS lbfgs;
    lbfgs.MaxIterations() = 100;
    lbfgs.MaxLineSearchTrials() = lineIter;
    arma::mat beta(X.n_rows, 1, arma::fill::zeros);
    lbfgs.Optimize(glm, beta, ens::Report(0.1));

    return beta;
}

// [[Rcpp::export]]
arma::mat log_glm_lbfgs_noLog(const arma::mat& X, const arma::colvec& y, const arma::colvec& weights, const arma::colvec& offset, const int& lineIter) {

    GLM glm(X, y, weights, offset);
    ens::L_BFGS lbfgs;
    lbfgs.MaxIterations() = 100;
    lbfgs.MaxLineSearchTrials() = lineIter;
    arma::mat beta(X.n_rows, 1, arma::fill::zeros);
    lbfgs.Optimize(glm, beta);

    return beta;
}

int main()
{
  arma::mat X(10, 1000, arma::fill::randn);
  // In the example rows 1 through 4 have a binomial distribution, but I don't
  // really feel like implementing that in Armadillo...
  arma::vec beta(10, arma::fill::randn);
  beta *= 0.5;

  arma::vec off = arma::vec(1000, arma::fill::randn);

  // Not sure exactly what this is in the example.
  arma::vec y = arma::vec(1000, arma::fill::randn);
  arma::vec yLog = arma::exp(y);
  arma::vec weight = arma::ones<arma::vec>(1000);

  arma::vec result = log_glm_lbfgs_noLog(X, yLog, weight, off, 10);
  std::cout << "result: " << std::endl << result;
  arma::vec result2 = log_glm_lbfgs(X, yLog, weight, off, 10);
  std::cout << "result2: " << std::endl << result2;
}

Now I actually am not R-literate, so I can't really parse exactly how you made X and y, but I at least made them have the right shape (although I used a column-major representation not a row-major representation).

But this seems to work just fine, and it prints everything I ask:

result: 
  -0.1362
   0.0858
  -0.0552
  -0.0052
   0.0553
  -0.0061
  -0.0465
  -0.0130
  -0.0914
  -0.0319
Optimization Report
--------------------------------------------------------------------------------

Initial coordinates: 
 0.0000  0.0000  0.0000  ... 0.0000

Final coordinates: 
-0.1362  0.0858 -0.0552  ... -0.0319
iter          loss          loss change   |gradient|    total time    
0             9247.567      0.000         3166.043      0.000         
1             9132.302      115.265       582.953       0.000         
2             9126.114      6.188         302.031       0.000         
3             9123.418      2.696         98.225        0.001         
4             9123.342      0.076         66.239        0.001         
5             9123.291      0.051         5.527         0.001         
6             9123.290      0.001         2.492         0.001         
7             9123.290      0.000         0.471         0.001         
8             9123.290      0.000         0.145         0.001         
9             9123.290      0.000         0.113         0.001         
10            9123.290      0.000         0.013         0.001         
11            9123.290      0.000         0.006         0.001         
12            9123.290      0.000         0.001         0.001         

--------------------------------------------------------------------------------

Version:
ensmallen:                    2.15.1 (Why Can't I Manage To Grow Any Plants?)
armadillo:                    8.400.0 (Entropy Bandit)

Function:
Coordinates rows:             10
Coordinates columns:          1

Loss:
Initial                       9247.567
Final                         9123.290
Change                        124.277

Optimizer:
Iterations:                   13
Coordinates max. norm:        3166.043
Evaluate calls:               18
Gradient calls:               18
Time (in seconds):            0.001
result2: 
  -0.1362
   0.0858
  -0.0552
  -0.0052
   0.0553
  -0.0061
  -0.0465
  -0.0130
  -0.0914
  -0.0319

So I'm not sure exactly what's going on. I'll see soon if I can make it fail from R too.

benmarchi commented 3 years ago

I'm glad that you were able to reproduce the issue. The code I shared is a pretty significant simplification of the actual problem I'm trying to solve, but having the X matrix contain some, but not all, binary columns is important to the problem and seems to be contributing to the crashing behavior. I generated 'y' so that it was strictly positive (such that the objective and gradient are defined for all values in y).

Please let me know if there is anything else I can do to help figure out what is going on.

P.S. Why did you switch to column major? Are ensumallen and armadillo more efficient when the problem is structured like this?

rcurtin commented 3 years ago

Sorry, maybe I wrote badly---I was not able to reproduce the issue.

Can you describe how the X matrix is generated? I'm sure I can implement Armadillo code to do it, but the problem is that I can't understand the R that currently does it. :)

I switched to a column-major representation because that's the internal representation that Armadillo uses (which in turn was chosen because BLAS and LAPACK in FORTRAN expect column-major ordering). So, e.g., if you have an algorithm that is iterating over points, it would be best for Armadillo's efficiency if each individual point corresponded to a column. Now... there are some algorithms for which the most efficient representation is with each dimension as a column, but at least from my perspective these seem to be in the minority.

In any case, it depends a lot on the size of the problem whether you'll see any speedup from making that change. If the problem is small, the speedup would likely be small too. (But again, depends on the algorithm.)

rcurtin commented 3 years ago

I was able to reproduce this from R today. Still trying to narrow down what the cause is. (Also, I can just dump the data from R so no need to rewrite the data creation in C++.)

rcurtin commented 3 years ago

Hey @benmarchi, I got to the bottom of the issue and opened #252 to fix it. Basically, L-BFGS is not taking any steps at all during optimization. Maybe the gradient is implemented incorrectly? Here's the "corrected" Report output:

Optimization Report
--------------------------------------------------------------------------------

Initial coordinates:
 0.0000  0.0000  0.0000  ... 0.0000

Final coordinates:
 0.6530  0.0543  0.0527  ... -0.5212
iter          loss          loss change   |gradient|    total time

--------------------------------------------------------------------------------

Version:
ensmallen:                    2.15.1 (Why Can't I Manage To Grow Any Plants?)
armadillo:                    9.900.1 (Nocturnal Misbehaviour)

Function:
Coordinates rows:             10
Coordinates columns:          1

Loss:
Initial                       2604569365.849
Final                         2604569365.849
Change                        0.000

Optimizer:
Iterations:                   0 (No steps taken!  Did optimization fail?)
Evaluate calls:               11
Gradient calls:               11
Time (in seconds):            0.011
benmarchi commented 3 years ago

Great! Thanks for your help in tracking down the issue. I can definitely check the form of the gradient, but I think the proposed solution is nice in that it provides some indication that something may be wrong with the optimization.

zoq commented 3 years ago

Alright just merged - https://github.com/mlpack/ensmallen/pull/252.

coatless commented 3 years ago

@zoq / @rcurtin any chance of a new point release with the fix being released? It just missed the 2.16.0 release.

zoq commented 3 years ago

@zoq / @rcurtin any chance of a new point release with the fix being released? It just missed the 2.16.0 release.

Good idea, I can open a PR.

zoq commented 3 years ago

Let's fix https://github.com/mlpack/ensmallen/issues/253 as well.

rcurtin commented 3 years ago

I'm having a lot of trouble reproducing #253. Maybe can we merge #258 and do a release then, and perhaps a fix for #253 can come later. :+1:

zoq commented 3 years ago

I'm having a lot of trouble reproducing #253. Maybe can we merge #258 and do a release then, and perhaps a fix for #253 can come later.

Let's see if I can reproduce the issue on one of my systems.