During AugLagrangian's optimization, L_BFGS sets coordinates to nan

olgavrou commented 1 year ago

Issue description

During AugLagrangian's optimization, L_BFGS sets coordinates to nan

I believe the issue is here

scalingFactor = dot(sMat, yMat) / dot(yMat, yMat);

if the two dot products are very small this leads to the scaling factor becoming a nan

In my experiment, if I replace that line with the code pasted below:

scalingFactor = dot(sMat, yMat) / std::max(1e-12, dot(yMat, yMat));

(EDITED)

or

double max_y = arma::abs(yMat).max();

if (max_y == 0.0)
{
  scalingFactor = 0.0;
}
else
{
  auto z = yMat / max_y;
  scalingFactor = (dot(sMat, z) / max_y) / dot(z, z);
}

which results in the optimizer working as expected.

Please let me know if you think this calculation is ok for the library and if you would be open to either patching it or accepting a patch.

Your environment

version of ensmallen: 2.19.1
Ubuntu 20.04:
compiler: gcc 9.4.0
version of Armadillo: 12.1.90
any other environment information you think is relevant:

Steps to reproduce

I have been iterating over the implementation of my constraint optimization problem you can see the latest version here and the behaviour is seen when this test (CheckUpdateRule500WIterations) is run

but unfortunately I don't have a smaller repro in hand

Expected behavior

Coordinates not getting set to nan

Actual behavior

Coordinates get set to nan

rcurtin commented 1 year ago

Thanks for the clear report! I agree that this is a problem and I like your suggested fixes. I might pick the first solution as it will still compute a nonzero scaling factor if || yMat || is very small, but I don't have a particularly strong opinion and could be convinced either way. If you'd like to open a PR I would gladly review it and we can get the fix merged. Thanks again! :+1:

olgavrou commented 1 year ago

OK will do :) I think the first solution is probably working better, I got another nan with the second one at some point

conradsnicta commented 1 year ago

scalingFactor = dot(sMat, yMat) / std::max(1e-12, dot(yMat, yMat));

Instead of a hardcoded value like 1e-12, suggest to use someting like
1000 * std::numeric_limits<CubeElemType>::epsilon().

The user may elect to use matrices and cubes with single-precision floating point values, rather than double-precision. In other words, CubeElemType can be either float or double. For single-precision, the hardcoded 1e-12 is probably too low in this context.

conradsnicta commented 1 year ago

PS. While we are at it, the following line should be replaced: https://github.com/mlpack/ensmallen/blob/master/include/ensmallen_bits/lbfgs/lbfgs_impl.hpp#LL93C1-L93C57

 scalingFactor = 1.0 / sqrt(dot(gradient, gradient));

with

 scalingFactor = 1.0 / arma::norm(gradient, "fro");

This is for two reasons: (1) clarity of intent, (2) Armadillo will use a more robust algorithm to calculate the norm.

mlpack-bot[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1:

conradsnicta commented 1 year ago

Should be resolved in #368. If not, please re-open and provide more details.

mlpack / ensmallen